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INRIA  Sophia-Antipolis 


Contract  DAJA45-90-C-0008 


Nonlinear  Filtering 
and  Approximation  Techniques 


This  report  presents  the  results  obtained  by  the  contractors  in  the  study  of  partially  observed 
diffusions.  Because  nonlinear  filtering  is  a  central  theme  in  the  works  reported  below,  we  begin 
with  a  brief  presentation  of  this  problem. 

Let  {(-Y(,  y'() ,  /  >  0}  be  a  pair  of  stochastic  processes  satisfying 


d.\,  =  b{  X,  )dl  +  a(  A', )  dW,  +  p{  A, )  dV, 
dV,  =  h(X,)dt  +  dVt 


(1) 


where  t  >  0}  are  two  independent  Wiener  processes,  and  the  initial  state  .Vq  is  a 

random  variable  independent  of  <  >  0}.  The  process  {A', .  1  >  0)  is  not  observed.  Wo 

observe  {I'l ,  1  >  0}  and  we  seek  to  estimate  the  current  state  A',  given  the  information  available 
at  time  t,  i.e.  given  =  clT, ,  0  <  s  <  t). 

Note  that  the  choice  of  the  above  model  (1)  means  that  the  state  process  {.V, .  (  >  0}  is  a 
continuous  Markov  process. 

The  best  estimate  in  the  mean  square  error  sense,  of  any  function  of  the  unknown  r.v.  .V, 
say  d>{X,),  based  on  is  the  conditional  mean  |  >’,J.  and  computing  this  quantity  for 

any  function  (p{-)  reduces  to  computing  the  conditional  law  of  .V,  given  Assuming  that  this 
conditional  law  has  a  density  with  respect  to  the  Lebesgue  measure,  it  is  well  known  that  an 
unnormalized  version  P((x)  of  the  conditional  density  satisfies  a  recursive  equation,  actually  a 
stochastic  PDE  called  the  Zakai  equation 


dp,  =  L’Vidt  +  Y^BlndYl^ 


(2) 


where  L'  and  are  the  adjoint  in  the  sense  of  the  partial  differential  operators 


■  .;=! 


dx,dij 


dx. 


and  =  hi,  +  Y,P'k-fr- 


respectively.  Note  that  at  each  time  t,  pt(  )  is  a  random  function  of  the  state  variable  i.  i.e.  a 
random  element  of  an  infinite  dimensional  space.  This  is  of  course  a  serious  problem  for  practical 
implementation. 

Let  us  describe  the  results  which  have  been  obtained  during  the  period  covered  by  the  present 
contract. 
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ZPB 


The  purpose  of  this  software  is  to  make  available  the  experience  accumulated  by  the  contractors 
about  numericaJ  techniques  in  nonlinear  filtering.  The  basic  idea  is,  given  a  formal  description  of 
both  the  model  ( 1 )  and  the  problem  to  be  solved  (e.g.  filtering,  smoothing,  hypotheses  testing, 
etc.),  to  produce  a  program  for  the  numerical  solution  of  the  corresponding  Zakai  equation  (2). 

The  current  version  of  ZPB  is  based  on  the  computer  algebra  system  Maple'.  The  main 
improvement  over  previous  versions,  is  the  existence  of  a  user  interface  and  graphical  tools 
based  on  the  X  Window  system^.  The  interface  guides  the  user  through  the  following  steps 

o  definition/modification  of  the  model  and  the  problem  to  be  solved,  and  automatic  gener¬ 
ation  of  the  corresponding  Fortran  program, 

o  definition/modification  of  numerical  values  related  with  either  the  model  or  the  algorithm, 
execution  of  the  Fortran  program,  and  visualization  of  the  results. 

o  saving/loading  of  interesting  examples. 

Graphical  tools  are  available  to  visualize  the  results  of  the  computation 

o  a  representation  of  both  the  simulated  state  trajectory  and  the  estimated  state  trajectory 
vs.  time  is  provided  in  a  first  window,  as  well  as  a  shaded  area  representing  some  confidence 
region  for  the  conditional  distribution  at  a  given  level, 

o  a  time  can  be  selected  in  the  first  window,  and  a  representation  of  the  conditional  density 
at  the  selected  time  is  then  provided  in  a  second  window, 

o  it  is  also  possible  to  visualize  in  the  second  window,  the  continuous  time  evolution  of  the 
conditional  density. 

A  short  document  presenting  what  is  currently  available  in  ZPB,  and  what  is  to  be  developed 
in  the  near  future,  is  joint  to  the  report. 

2  Discretization  of  the  Zakai  equation 

We  are  interested  in  studying  numerical  methods  for  the  approximate  solution  of  the  Zakai 
equation  (2) 

d 

dp,  =  L‘p,dt+j2B:pt<iy,‘‘ 

k=\ 

where  I  is  a  second  order  partial  differential  operator,  and  Bk  are  first  order  partial  differential 
operators. 

‘Maple  is  a  registered  trademark  of  Waterloo  Maple  Software. 

^The  X  Window  system  is  a  trademark  of  the  MIT. 
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In  the  case  of  independent  noise,  where  p  =  0  in  equation  ( 1 )  so  that  Bk  =  hi^  are  "zero-th 
order”  partial  differential  operators,  we  have  studied  in  [10]  time  discretization  schemes  based 
on  splitting-up  approximation,  with  error  estimate  of  order  0(A)  and  0(A^/^)  where  A  is  the 
time  step.  A  probabilistic  interpretation  of  the  discrerization  schemes  was  also  provided. 

In  the  case  of  correlated  noise,  we  have  proposed  in  [3]  a  time  discretisation  scheme  based 
on  the  same  splitting-up  approach,  with  error  estimate  of  order  0(A).  The  correction  part 
in  the  splitting-up  approximation,  is  related  with  a  degenerate  stochastic  PDE.  for  which  a 
representation  result  in  terms  of  stochastic  characteristics  can  be  found  in  [9]  and  [8].  We 
have  obtained  a  time  discretization  scheme  for  the  degenerate  stochastic  PDE  based  on  Euler 
approximation  of  the  stochastic  characteristics,  with  error  estimate  of  order  0(  A'/^). 

Finally,  extending  the  results  of  Raviart  in  the  deterministic  case  [20],  we  have  proposed 
in  [4]  a  space  discretization  based  on  particle  appro.ximation ,  for  first  order  stochastic  PDE  in 
Stratonovich  form,  which  are  degenerate  second  order  stochastic  PDE. 

In  relation  with  the  design  of  finite  time  observers  for  deterministic  partially  observed  systems 
presented  in  [5],  full  discretization  schemes  have  been  introduced  in  the  special  case  of  noise  free 
state  equations,  where  both  <7  =  0  and  p  =  0  in  equation  (1),  see  [7]. 


3  Filtering  of  piecewise  linear  systems 


□  Continuous  time  systems 


We  consider  a  multi-dimensional  stochastic  system  with  dim  .X’  =  dim  >'  =  tti.  described  by 


d.V,  —  b{.\t) dt -y  d\  I  +  g{Xt)  d\\  t 

dV,  =  h(X,)dl  +  edW, 


(3) 


where  f  is  a  small  parameter. 

Our  interest  is  for  the  situation  where  the  coefficients  are  linear  (or  constant)  on  each  com¬ 


ponent  of  a  finite  polyhedral  partition  (0,,  1  < 
simplicity  we  assume  that  /  =  2.  i.e.  0_  =  {i  : 
u  is  some  non  zero  vect<  r  of  R"‘.  Let  A  =  ; 

We  assume  that  the  coefficients  of  (3)  satisfy 


b(i)  = 

X  e  &- 

B.i 

f{x)  = 

9{x)  = 

G_ 

h{x}  = 

H.r 

i  <  1}  of  the  state  space  R”*.  For  the  sake  of 
(i.u)  <  0)  and  0+  =  [i  :  (c,u)  >  0}  where 
(i,  u)  =  0}  denote  the  separating  hyperplane. 

r  e  0+ 

B^x 


and  we  assume  that  both  //_  and  //+  are  invertible.  The  case  where  h[  )  is  one-to-one  has 
been  considered  in  the  previous  contract,  and  we  asume  here  that  h(  )  is  not  globally  injective. 
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On  each  of  the  half  spaces  0_  and  0+  we  have  a  linear  system  with  non  Gaussian  initial 
condition.  If  we  knew  that  the  state  would  remain  in  a  given  half  space  for  a  certain  time 
interval,  then  it  would  be  natural  to  approximate  the  optimal  nonlinear  filter  by  the  Kalman- 
Bucy  filter  associated  with  that  half  space.  The  design  of  an  approximate  filter  is  based  on  this 
idea. 

o  two  Kalman  filters  X*  and  A',“  are  considered  which  are  associated  with  the  two  linear 
systems  corresponding  to  the  original  piecewise  linear  system, 

o  a  first  test  is  used  to  find  a  time  interval  [a,  6]  such  that  A’l  does  not  cross  the  separating 
hyperplane  A  in  [a,i], 

o  provided  that  such  a  time  interval  [a, 6]  has  been  found,  a  second  test  is  used  to  decide 
whether  A',  6  0_  or  A',  6  0+  on  (a,h],  i.e.  to  decide  which  Kalman  filter  or  A,”  to 

follow  on  [a,f/]. 

We  have  proved  in  [19]  that  an  hyperplane-crossing  test  can  be  designed  with  exponentially 
small  probability  of  error,  and  that  it  is  possible  to  design  a  test  to  decide  between  0.  and  0+, 
under  either  one  of  the  following  detectability  hypothesis 

(DH,)  H.Z.  Hi  ^  HX 


(DH,) 


r  =  H.z.  h:  =  //+  E+  HI 

kerlff-fi.  H:'  - 

the  matrix  P"*  [//-  B.  Hz'  -  B+  is  symmetric 


where  E_  =  F-  FZ  +  6'_  G'l  and  similarily  for  II+. 

The  main  difference  between  the  two  detectability  hypothesis  is  that  under  (Dll]),  we  can 
decide  almost  instantaneously  with  an  exponentially  small  probabibty  of  error,  whereas  under 
(DHj),  we  need  the  interval  [n.6]  to  be  long  enough  (actually  almost  infinite)  in  order  to  get  an 
exponentially  small  probability  of  error. 

Some  examples  in  the  ca.se  where  dim  A’  >  dim  V  have  been  also  considered  in  [14],  [iSj. 
and  [19]. 


□  Discrete  time  systems 

W’e  consider  a  one-dimensional  discrete  time  stochastic  dynamical  system  described  by 

i*+i  =  Xk  + cb{xi,)  + y/ea(xi,)wk 

Vk  =  h{xi,)  +  ^/evk 

where  c  is  a  small  parameter.  Such  a  system  results  e.g.  from  the  discretization,  with  time  step 
At  =  £,  of  a  continuous  time  system  with  small  observation  noise,  such  as  (3)  above. 


4 


/ 


Our  interest  is  for  the  situation  where  the  coefficients  are  piecewise  linear  (or  constant)  i.e. 


I  <  0  I  >  0 

6(i)  =  B_i  B+x 

a(x)  =  (7-  a+ 

/i(i)  =  H-X  H-X 


In  the  case  where  h(-)  is  not  one-to-one  (i.e.  /f_  <  0),  we  introduce  the  following  detectabil¬ 

ity  hypotheses 


(DH.) 


Hi  ol  /  Hi  ol 


(DHj) 


Hi  ol  =  Hi  al 
B.  #  B+ 


The  case  where  hypothesis  (Dili)  holds  has  been  considered  in  [2].  Under  hypothesis  (DH2).  we 
have  proved  in  [15]  that  an  efficient  approximate  filter  can  be  built,  which  is  based  on  the  same 
following  idea  than  in  the  continuous  lime  situation 


o  two  Kalman  filters  and  are  considered  which  are  associated  with  the  two  linear 
systems  corresponding  to  the  original  piecewise  linear  system, 

o  a  first  test  is  used  to  find  a  lime  interval  [0,6]  such  that  x*  does  not  cross  the  zero  axis  In 
[a,  6]  with  high  probability. 

o  provided  that  such  a  time  interval  [a, 6]  has  been  found,  a  second  test  is  used  to  decide  on 
the  sign  of  x*  in  [a.b],  i.e.  to  decide  which  Kalman  filter  xjj'  or  x^  to  follow  . 


Using  the  same  heuristic  approach  as  in  (2),  i.e.  approximating  some  discrete  processes  by- 
diffusion  processes,  explicit  expressions  have  been  obtained  for  the  selection  of  thresholds. 

Some  numerical  experiments  have  been  performed  on  various  examples,  and  the  proposed 
approximate  filter  has  been  compared  with  the  optimal  filter  obtained  from  the  numerical  so¬ 
lution  of  the  corresponding  Zakai  equation.  It  is  worth  to  mention  that  the  Fortran  programs 
for  the  numerical  solution  of  the  Zakai  equations,  have  been  automatically  generated  by  our 
software  ZPB  which  is  described  above. 


4  Statistics  of  partially  observed  diffusions 

We  have  shown  in  [1]  that  the  Zakai  equation  provides  also  a  way  to  compute  the  likelihood 
function/ratio  in  a  large  variety  of  statistical  problems  for  partially  observed  diffusion  processes, 
including  ;  parameter  estimation,  binary  detection,  change  detection,  etc. 


I 
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An  important  issiip  is  to  prove  that  these  statistical  procedures  based  on  the  likelihood 
approach,  can  provide  good  estimates  or  decisions  in  some  asymptotic  sense.  Consider  for 
example  the  statistical  model 


dA',  =  bB{X,)dt  +  edW,  . 
df  (  =  dt  £  dV^  , 

where  9  6  0  is  an  unknown  parameter,  which  appears  in  the  coefficients  6«(  ),  /is(  )  and  also  in 
the  density  Poi  )  of  the  initial  condition  Xq. 

Computing  the  likelihood  function  for  the  estimation  of  the  unknown  parameter  6  on  the 
basis  of  observations  {V,  ,  0  <  (  <  T}.  involves  the  solution  of  the  Zakai  equation  corresponding 
to  the  associated  filtering  problem.  We  have  proved  in  [6j  the  consistency  of  the  MLE  under  the 
small  noise  asymptotics  c  1  0.  in  the  foUowing  way 

o  using  large  dei'ialiotDi  theory,  it  is  proved  that  the  limiting  points  of  the  MLE  sequence 
belong  to  the  set  of  minimizing  points  of  a  least -squares  type  functional  for  the  estimation 
of  6  in  the  limiting  deterministic  system  c  =  0, 

o  under  an  identifinbilily  properly  oi  this  limiting  deterministic  system,  this  set  reduces  to 
the  "true"  value  of  the  parameter. 


5  'IVansfer  to  the  US 

F.  LeGland  has  presented  some  results  on  time  discretization  of  the  Zakai  equation  [10],  anil 
filtering  of  piecewise  linear  systems  [17],  at  the  IEEE  CDC  in  Tampa  (December  19S9). 

P.  Milheiro  de  Oliveira  has  presented  the  results  on  approximate  filters  for  discrete  time 
systems  [13],  at  the  IEEE  CDC  in  Honolulu  (December  1990). 

E.  Pardoux,  F.  Campillo  and  F.  LeGland  have  participated  to  the  NSF-lNRL-\  Workshop  on 
Stochastic  Analysis,  organized  at  Rutgers  University,  where  the  results  on  particle  approximation 
for  first  order  stochastic  PDE  [4],  and  numerical  approximation  of  nonlinear  filters  and  finite 
time  observers  [7]  have  been  pre.sented  (May  1991). 

E.  Pardoux  and  F.  LeGland  have  participated  to  the  International  Conference  on  Stochas¬ 
tic  Partial  Differential  Equations,  and  have  given  tutoritd  lectures  at  the  School-Seminar  on 
Stochastic  Partial  Differential  Equations,  organized  at  the  University  of  Northern  Carolina  in 
Charlotte,  with  partial  support  of  the  Army  Research  Office  (May  1991). 

E.  Pardoux  has  given  a  series  of  lectures  on  Nonlinear  Filtering  and  Associated  Partial 
Differential  Equations,  in  Ecole  d’Ete  de  Probabilites  XIX  in  Saint-Flour  (August  1989).  The 
lecture  notes  [16]  pre.sent  the  most  recent  developpments  in  the  theory  of  nonlinear  filtering, 
including  results  obtained  by  the  contractors,  for  the  first  time  in  book  form. 
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Introduction 


The  purpose  of  ZPB  is  to  produce  Fortran  programs  for  the  numerical  solution  of  the  Zakai 
equation.  This  equation  allows  to  solve  various  estimation  problems  in  partially  observed 
systems,  such  as 

•  state  estimation  ;  filtering,  fixed-interval  smoothing,  fixed-lag  smoothing,  etc. 

•  detection,  either  off-line  or  sequential, 

•  p^ameter  estimation, 

•  change  detection  (disorder),  either  off-line  or  sequential. 

Given  a  description  of  both  the  model  and  the  problem  to  be  solved,  the  Fortran  pro¬ 
grams  are  generated  with  the  help  of  the  computer  algebra  system  Maple.  This  description 
is  contained  in  a  file  zpb.data,  in  the  form  of  Maple  instructions  keyword :=value;. 

The  generated  Fortran  programs  rely  on  routines  from  the  scientific  library  NAG.  w'hich 
is  assumed  to  be  available. 

In  addition  to  the  Maple  program,  an  interface  is  provided  to  help  the  user  defining 
the  model  and  the  problem  to  be  solved,  and  graphical  tools  are  available  to  visualize  the 
results.  Both  the  interface  and  the  graphical  tools  are  based  on  the  X  Window  system. 

The  possible  models  and  problems  to  be  solved,  and  some  of  the  algorithms  actually 
implemented  in  the  generated  Fortran  programs,  are  presented  in  these  notes. 


Maple  is  a  trtidemark  of  Waterloo  Maple  Software.  NAG  is  a  trademark  of  the  Numerical 
Analysis  Group  Ltd.  PostScript  is  a  trademtirk  of  Adobe  Systems  Inc.  The  X  Window 
system  is  a  trademark  of  the  MIT. 


1 


A  Models  of  Partially  Observed  Systems 


State 


The  class  of  systems  to  be  considered  is  modeled  as  a  m-dimensional  diffusion  process 

dX,  =  b,{X.)dt  +  a,(X,)dW.  , 

where  {Wt ,  t  >  0}  is  a  r-dimensional  Wiener  process  with  covariance  matrix  Q.  This 
includes  the  particular  case  of  systems  modeled  as  the  solution  of  an  ordinary  differential 
equation 

A",  =  6«(-V,)  . 


In  the  case  of  a  noise  driven  state  equation,  it  is  assumed  that  the  state  is 
one-dimensional.  Extension  to  two-dimensional  state  is  planned. 


Observations 


The  state  of  the  system  is  not  directly  observed.  However,  d-dimensional  noisy  nonlinear 
observations  of  the  state  are  available,  either  at  discrete  times  {<i,  tj- '  •  } 

Zk  =  hk(Xi^)  +  I'k  , 

where  {ri,t>2,  -  }  is  a  d-dimensional  Gaussian  white-noise  sequence,  with  non  singular 
covariance  matrix  R,  or  in  continuous  time 

Zt  =  ht(Xt)  +  t'l  , 

where  {u, ,  t  >  0}  is  a  d-dimensional  Gaussian  white-noise  process,  with  non  singular 
covariance  matrix  R.  Introducing  the  integrated  observation  process 

y ;  =  /  z.ds , 

Jo 

the  observation  equation  becomes 

dT,  =  ft,(A',)d<+dVi  , 

where  {V,,  t  >  0}  is  a  d-dimensional  Wiener  process,  with  non  singular  covariance  ma¬ 
trix  R. 

Another  information  about  the  model  is  the  correlation  structure  between  the  state 
noise  and  the  observation  noise. 

It  is  assumed  that  the  state  noise  and  the  observation  noise  are  independent. 
Extension  to  allow  noise  correlation  is  planned,  see  Florchinger-LeGland  [4] 
and  [5]. 
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A  first  description  of  the  model  is  provided  by  the  following  keywords  to  be  defined 
in  the  file  zpb.data 


dim.8tate 

dimension  m  of  the  state 

(integer) 

noisejdriven-state 

true  1 false 

if  noiae_driven_state=true 

dim-noise 

dimension  r  of  the  driving.noise 

(integer) 

observation-mode 

discrete | continuous 

dim.obs 

dimension  d  of  the  observation 

(integer) 

Coefficients 


The  next  step  is  to  provide  the  algebraic  expression  of  the  drift  vector  fc|(  ),  the  diffusion 
matrix  (T((')  (in  the  case  of  a  noise  driven  state),  and  the  observation  function  hk(  )  or  /;,(■) 
depending  on  whether  the  observations  are  available  at  discrete  times  or  in  continuous 
time.  In  addition,  the  probability  distribution  of  the  m-dimensional  initial  state  -Vo  has 
to  be  selected  among  the  following  elementary  probability  distributions 

(i)  Dirac  mais  at  point  lo. 

(ii)  Gaussian  distribution,  with  mean  p  and  covariance  matrix  E, 

(iii)  uniform  distribution  on  an  coordinate  cube  [xi,i2]. 

Extension  to  allow  mixture  of  elementary  probability  distributions  is  planned. 

A  further  description  of  the  model  is  provided  by  the  following  add'.ional  keywords 
to  be  defined  in  the  file  zpb.data 


drift 

drift 

m-vector  (algebraic  Maple  expression)  ] 

if  noise_driven_state=true 

diffusion 

diffusion  (m,r)-matrix  (algebraic  Maple  expression) 

observation 

observation  d-vector  (algebraic  Maple  expression) 

initial 

dirac 1 gauss ian 1 uniform 

1 

Parameters 


The  algebraic  expression  of  the  coefficients  can  contain,  in  addition  to  the  state  variable  x, 
or  xl . XB  if  the  state  is  m-dimensional,  &nd  the  time  variable  t,  some  other  parame¬ 

ters.  A  list  of  these  parameters  is  build  by  the  Maple  program,  from  the  description  of  the 
model  given  in  the  file  zpb.data,  and  stored  in  the  file  .model.  Additional  parameters 
include 
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(i)  parameters  of  the  initial  probability  distribution, 

(ii)  covariance  matrices  Q  and  R  (diagonal)  of  the  noise  processes. 

The  numerical  values  of  all  these  parameters  are  read  at  run  time  by  the  generated  Fortran 
programs  in  two  different  files  simul  .data  and  filt.data,  This  allows  to  consider  mis- 
specified  estimation  problems,  i.e.  to  address  robustness  issues. 


Extensions  to  allow  the  covariances  matrices  to  depend  on  parameters,  and 
to  treat  more  general  robustness  issues  (different  algebraic  expression  of  the 
coefficients  for  simulation  and  filtering),  are  planned. 
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B  Estimation  Problems  and  their  Solutions 

With  the  diffusion  equation 


dXt  =  b,{Xt)dt-{-at{X,)dW,  , 


is  associated  the  following  time  dependent  second  order  partial  differential  operator 


M=1 


with  the  covariance  matrix  ai  =  <Jt  Q  a\  . 

The  state  estimation  problem  consists  in  estimating  the  state  Xt  given  only  the  ob¬ 
servations.  In  the  case  of  discrete  time  observations,  the  observation  cr-algebra  is  defined 
by 

2/c  =  a(2i,  -  ••  ,2k)  , 

whereas,  in  the  case  of  continuous  time  observations,  it  is  defined  by 


>»,  =  <T(r, ,  0  <  s  <  <)  . 


Notation  Throughout  the  paper,  the  scalar  product  and  the  corresponding  norm  in 
R'^,  associated  with  the  symmetric  positive  definite  matrix  are  denoted  by  ( • ,  •  )h-j 
and  I  •  Ir-1  respectively. 


Filtering 


The  goal  here  is  to  estimate  recursively  the  current  slate  A’*  at  time  1  ,  given  only  the 
p2ist  observations  up  to  time  t. 


_ Discrete  Time  Observations 

Introduce  the  conditional  o  priori  and  a  posteriori  probability  densities 

pl{x)dx  =  P{Xt„  G  dx  I  2k-i)  and  pk{x)dx  =  P(X,^  6  di  |  2*)  , 
respectively.  For  any  <*_i  <  t  <  ti,  ,  introduce  also 

pf  (x)  dx  =  P{X,  €  dx  I  2*_, )  . 

The  transition  from  p*_i(i)  to  Pk(x)  is  divided  into  two  steps 
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i.e. 

P(X,edx\y,)  =  c,  p,{x)dx . 
where  Ct  is  a  normalization  constant. 


Sampled  Observations 


Introduce  a  uniform  partition  0  =  <o  <••<<*<•••  of  the  time  interval  [0,  cxj),  with 
lime  step  A  =  <*  —  tk-i-  The  first  step  is  to  sample  the  available  observation  trajectory, 
i.e.  to  build  the  following  sequence  of  compressed  observations 

Vk  =  J  h.(X.)ds  +  ilV,.  -  y„..]  ,  (») 

and  to  use  the  approximate  observation  model 

=  +  {**) 

instead,  where  }  is  a  Gaussian  white  noise  sequence  with  covariance  matrix 

R/A. 

Defining  the  sampled  observation  <7-aIgebra 
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it  is  possible  to  use  the  results  available  for  the  cuse  of  discrete  time  observations.  Intro¬ 
duce  the  conditional  priori  and  a  posteriori  probability  densities 

p*- (i)  di  =  P(X„  e  dx  I  yf., )  and  pt(x)  dx  =  P(A\  G  dx  \  >’f )  , 

respectively.  For  emy  <*_!  <  t  <  t*,  introduce  also 

p1(x)dx  =  P(X,  6  dx  I  yt,) . 

The  transition  from  p*_i(x)  to  pt(x)  is  divided  into  two  steps  just  as  in  the  case  of 
discrete  time  observations,  except  that  the  correction  step  involves 

=  exp{-iA  |yf  -  , 

which  is  the  likelihood  function  for  the  estimation  of  in  the  approximate  observation 
model  (**),  based  on  the  sampled  observation  yf  alone  as  defined  in  (♦),  and  Ck  is  a 
normalization  constant. 


Fixed  Interval  Smoothing 


The  goal  here  is  to  estimate  the  state  X,  at  any  time  0  <  t  <  T,  given  all  the  observations 
in  the  time  interval  [0,7’]. 


_ Discrete  Time  Observations 

Assume  that  the  final  time  T  satisfies  ts  <  T  <  </v+i  for  some  N.  Introduce  the  condi¬ 
tional  smoothing  probability  density 

qk(x)dx  -  P(X,^  edx  \  Zs)  ■ 

For  any  tk~i  <  <  <  fit  ,  introduce  also 

q^(x)dx  =  P(X,  edx\Zs). 

These  probability  densities  are  absolutely  continuous  with  respect  to  the  corresponding 
filtering  densities,  i.e.  qk(x)  —  Pk{^)  ■  Vk{x)  and  =  pf(x)  •  v‘(x)  . 

The  backward  transition  from  v*{x)  to  v*_i(x)  is  divided  into  two  steps 
•  at  time  tk 

Vk{x)  =  Ck-'i/k(x}  vk(x)  , 

where 

'Pit(x)  =  exp'[-|  |xt- At(x)|^-,}  , 

is  again  the  likelihood  function  for  the  estimation  of  A't,,  based  on  the  observation 
z/t  alone,  and  Ck  is  a  normalization  constant. 
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•  between  t*  and  tk-\  ,  the  derivative  n*(i)  solves  the  backward  Fokker-Planck  equa¬ 
tion 

with  initial  condition  t),*  (r)  =  >  which  gives  in  particular  . 

It  is  immediate  by  duality,  that 

(Pk,Vk)  =  (pi'.t't )  =  (pk-uVk-\)  , 

which  implies  that  the  conditional  densities  9t(x)  are  properly  normalized. 


Continuous  Time  Observations 


Here  also,  the  unnormalized  smoothing  conditional  probability  density  is  absolutely  con¬ 
tinuous  with  respect  to  the  corresponding  unnormalized  filtering  conditional  probability 
density,  i.e.  qt{x)  =  Pt(x)  ■  Vt(x)  where  the  derivative  solves  the  backward  Zakai  equation 

du,  -|-  Lt  v,dt  -E  Vt  =  0  , 


i.e. 

P{X,  €  dx  I  >V)  =  CT  ■  g,{x)dx  , 
where  ct  is  a  normalization  constant. 


0  l  T 


_ Sampled  Obsernations 

This  case  is  very  similar  to  the  case  of  discrete  time  observations.  Introduce  the  conditional 
smoothing  probability  density 

q,(x)dx  =  P(X,,  edx\y^). 
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For  any  <<<<*,  introduce  also 

q^{x)di  =  P{X,  £dx\y^). 

These  probability  densities  are  absolutely  continuous  with  respect  to  the  corresponding 
filtering  densities,  i.e.  qk{x)  =  pk{x)  ■  and  q}(x)  =  p*(i)  •  r*(i)  . 

The  backward  transition  from  vic{x)  to  ut_|(i)  is  divided  into  two  steps,  just  as  in  the 
case  of  discrete  time  observations,  except  that  the  correction  step  involves 

4']f{2)  =  exp{-iA|yf-A,,(i)|J,.,}  , 

which  is  the  likelihood  function  for  the  estimation  of  in  the  approximate  observation 
model  (♦♦),  based  on  the  sampled  observation  yf  alone  as  defined  in  (♦),  and  c*  is  a 
normalization  constant. 


The  description  of  the  estimation  problem  to  be  solved  is  provided  by  the  following 
keyword  to  be  defined  in  the  file  zpb.data 


[  problem  filtering  I  smoothing  | 


Extensions  to  other  state  estimation  problems,  suck  as  fixed-lag  smoothing,  or 
to  statistical  problems,  including  parameter  estimation,  detection,  change  de¬ 
tection,  etc.,  either  off-line  or  sequential,  are  planned,  see  Campillo-LeGland  [l]. 
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C  Numerical  Algorithms 

Given  a  model  and  an  estimation  problem  to  be  solved  about  this  model,  both  described  in 
the  Maple  file  zpb.data,  the  purpose  of  ZPB  is  to  provide  Fortran  programs  and  visualiza¬ 
tion  tools,  for  the  numerical  experimentation  and  evaluation  of  the  estimation  algorithm. 
This  involves  two  different  tasks 

•  simulation  :  the  goal  here  is  to  generate  a  trajectory  of  the  state  process,  and  a 
sequence  of  either  discrete  time  or  sampled  observations,  to  be  stored  into  the  files 
zpb. state  and  zpb.obs  respectively. 

•  estimation  :  the  goal  here  is  to  combine  a  priori  information  about  the  model  and 
the  observations  to  be  read  from  the  file  zpb.obs,  in  order  to  solve  the  selected 
estimation  problem.  Results  are  stored  into  the  files  zpb.estim  and  zpb. density. 

From  now  on,  the  state  process  is  assumed  to  be  one-dimensional. 

Extensions  to  allow  two-dimensional  state  process,  are  planned. 


Simulation 


The  time  horizon  T  and  the  time  step  A  between  successive  observations,  are  read  from  the 
file  algo  .data,  under  the  name  tmax  and  dt  respectively.  A  refined  time  step  A*/  =  A/.W 
is  introduced,  where  the  number  M  of  local  iterations  for  the  simulation  is  also  read  from 
the  file  algo. data,  under  the  name  locsimul. 

The  state  Xt„  is  approximated  by  z*  using  the  Milshtein  discretization  scheme  [10] 

'  tl  =  t*-,  , 

•  =  t\-'  -I-  Am  ,  1  <i<M 

.  tk  =  , 


xl  =  x*-i 


4  =  4  '  +  *(-<(4  ')  Am  -f  '}w\ 

+ 1  a,.-,  (xr' )  4.-. (xr' )  [  -  ami  ,  i  < .  <  a/ 


Xk  — 


where  {tuj[,  ■  •  •  is  a  Gaussian  white-noise  sequence  with  covariance  matrix  Q  ■  Am  ■ 

On  the  other  hand,  the  generation  of  the  sequence  of  observations  is  different  whether 
they  are  discrete  time  or  sampled  observations. 
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_ Discrete  Time  Observations 

The  observation  zi,  is  simulated  using  the  approximation  x*  of  the  state  Xt,, 

2k  =  hk{Xk)+Vk  , 

where  u*  is  a  Gaussian  random  vector  with  non  singular  covariance  matrix  R. 

_ Sampled  Observations 

The  sampled  observation  as  defined  in  (♦)  are  simulated  using  the  approximation 
{li,  •  •  ■  of  the  state  process  between  <*_i  and  t* 

I  " 

where  is  a  Gaussian  random  vector  with  non  singular  covariance  matrix  R/A. 


Gaussian  random  variables  are  generated  according  to  the  Box-Muller  algorithm,  see 
Rubinstein  [12j.  A  routine  boxmuller  is  provided  in  the  library  libzpb.a.  Uniformly 
distributed  random  variables  are  generated  by  the  NAG  library  routine  gOScaf,  in  con¬ 
junction  with  gOScbf .  The  seed  of  the  random  generator  is  read  from  the  file  algo  .data 
under  the  name  seed. 

Extensions  to  include  more  accurate  discretization  of  the  deterministic  part  of 
the  state  equation,  are  planned. 


Estimation 


It  follows  from  the  discussion  above,  that  either  for  the  filtering  or  the  smoothing  problem, 
and  whether  the  observations  are  in  discrete  time  or  sampled,  it  is  needed  to  discretize 
the  Fokker-Planck  equation  between  <*_i  and  tk 
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Time  Discretization 


The  time  horizon  T  and  the  time  step  A  between  successive  observations,  are  read  from  the 
file  algo. data,  under  the  name  tnai  and  dt  respectively.  A  refined  time  step  Ap  =  A/P 
is  introduced,  where  the  number  P  of  local  iterations  for  the  prediction  step  is  also  read 
from  the  file  algo. data,  under  the  name  locpred. 

The  filtering  density  p*  is  approximated  by  pt  using  the  implicit  Euler  scheme 


ik  =  > 

<*  =  <r'  +  -  1  <  i  <  p 

.  4  =  <r , 

Pk  =  pk-i , 

•  (/  -  Ap  L,.-iY  Pk  =  p'k'  >  <i  <  P 

pk  =  Ck  '^k  -P^  ■ 


This  approximation  is  based  on 


f*'*  f** 

Pi;  -  Pi;-'  =  P»  Pt\  -  Pil 


Space  Discretization 

Following  Kushner  [8],  the  second  order  partial  differential  operator  Lt  is  approximated  by 
a  finite  difference  matrix,  on  a  regular  bounded  coordinate  grid.  Only  the  one-dimensional 
case  is  considered  here.  The  bounded  domain  is  an  interval  D  =  [x,  x]  .  The  end  points 
z,  X  and  the  mesh  size  6  are  read  from  the  file  algo. data  under  the  name  xmin,  xmax 
and  dx  respectively. 

Let  R{  denote  a  one-dimensional  coordinate  grid  with  mesh  size  6.  For  any  i  €  Rf  , 
let  Ns{x)  =  {z,  z  ±  5}  denote  the  set  of  neighbours.  The  restricted  grid  is  D«  =  D  n  Rf  , 
the  set  of  interior  grid  points  is  Dj  =  {z  €  D<  :  Aj(z)  C  Di}  ,  and  the  set  of  boundary 
points  is  Ts  =  Ds  \  Ds  . 


X -6  X  X  +8 

-• - • - •- 
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~  L\4>(x)=  Y.  L\(x,y)4>(y)- 

yet>t 

The  only  non  zero  terms  in  the  time  dependent  matrix  if  are  the  terms  involving  neigh¬ 
bours 

LUx,x±6)  =  \^a,{x)  +  6bf(x)] 

[a,(j)  +  (5  |6,(x)|] 

and  in  addition  the  following  properties  are  satisfied 

>0  for  y  /  X 


Y  =  0 

Af(x)  =  -Lf(i,x)  =  5:Lf(x,y)  >  0 
v*^ 

where  the  latter  is  a  consequence  of  the  two  others. 

For  any  boundary  point  x  €  Fj  ,  the  definition  of  the  finite  difference  matrix  depends 
on  the  boundary  condition  to  be  satisfied. 

•  absorbing  (stopping)  boundary  : 

if(j'.y)  =  0  for  all  y  e  A^«(x)  , 
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•  reflecting  boundary  : 
at  leftmost  grid  point 

f  L‘t{z,x)  =  -A‘(x) 

I  Lf(i,x  +  6)  =  Af(i) 

[  L,{(x,x  —  ^)  =  0  , 

at  rightmost  grid  point 

L\(x,x)  =  -A‘(x) 

-  £f(x,x  +  S)  =  0 

LUx,x-6}  =  Af(x)  . 

The  nature  of  the  boundary  condition  is  read  from  the  file  algo .  data  under  the  name 
boundary  which  can  take  the  two  possible  values  stoppinglreflection. 

These  properties  show  that  the  time  dependent  matrix  Lf  is  the  instantaneous  jump 
intensity  matrix  of  a  pure  jump  Markov  process  {A',* ,  t  >  0}  evolving  on  the  grid  D(  . 
Conditionned  on  the  current  position  i  €  ,  the  inter-jump  time  and  the  next  position 

are  independent  random  variables.  Moreover,  the  inter-jump  time  is  an  exponential 
random  variable  with  parameter  Aj(x)  ,  and  the  probability  distribution  of  the  next 
position  is  given  by  irs(x,g)  =  Ls(x,g)/As(x)  ,  for  all  y  e  D(  . 


Full  Discretization 

Combining  finite  difference  approximation  of  the  second  order  partial  differential  operator 
Lt,  with  implicit  Euler  time  discretization,  results  in  the  following  sequence  of  linear 
systems 

Pk  =  Pk-i  , 

.  =  pr’ ,  y<t<p 

** 

pk  =  Ck  -^k  Pk  ■ 

This  is  a  tridiagonal  linear  system  which  can  be  solved  by  direct  method  :  first,  the 
matrix  is  factorized  using  Gaussian  elimination  algorithm  with  partial  pivoting,  then  the 
resulting  upper  and  lower  triangular  systems  are  solved.  This  is  done  by  the  NAG  library 
routines  fOllef  and  f041ef  respectively. 

Extensions  to  include  iterative  methods,  or  multigrid  methods  for  the  solution 
of  the  linear  system  are  planned,  and  will  have  to  be  used  in  the  case  of  a 
multi-dimensional  state  equation. 
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I  Parameters  I 

A  list  of  the  algorithm  parameters  is  build  by  the  Maple  program,  and  stored  in  the  file 
.algo.  These  parameters  include 

dt  —  time  step, 

tmin,  tmax  —  ends  of  the  time  interval, 

locsimul  —  number  of  local  iterations  for  the  simulation, 

seed  —  seed  of  the  random  number  generator, 

znin,  zmax  —  bounds  of  the  bounded  space  discretization  grid, 

dx  —  mesh  of  the  space  discretization  grid, 

locpred  —  number  of  local  iterations  for  the  prediction, 

boundary  —  nature  of  the  boundary  condition,  etc. 
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D  Organization  of  a  Session 


The  user  has  first  to  provide  a  description  of  both  the  model  and  the  problem  to  be 
solved.  This  description  is  contained  in  a  file  zpb.data,  in  the  form  of  Maple  instructions 
keyword: “value;.  The  list  of  admissible  keywords  is  summarized  below 


dim-Btate 

dimension  m  of  the  state 

(integer) 

noise-driven^tate 

truelfalse 

if  noise-driven-state“true 

dim-noise 

dimension  r  of  the  driving  noise 

(integer) 

observationjnode 

discrete  I  continuous 

dim.obs 

dimension  d  of  the  observation 

(integer) 

drift 

drift  m-vector 

(Maple  expression) 

if  noise-driven-State=true 

diffusion 

diffusion  (m,r)-matrix 

(Maple  expression) 

observation 

observation  d-vector 

(Maple  expression) 

initial 

dirac IgaussianI uniform 

problem 

filtering  1  smoothing 

The  file  zpb.data  can  be  created  with  any  text  editor.  Under  X  Window,  a  user 
interface  is  provided  to  create  and  modify  the  file  zpb.data  a  jii.ai n  ally. 

When  this  first  stage  is  completed,  the  Maple  pr.^grain  is  called,  and  Fortran  programs 
are  generated  and  compiled.  Two  data  file.-^  .tt^del  and  .algo  are  also  generated,  which 
contain  respectively  the  list  of  parameters  relevant  to  the  model  and  the  problem  to 
be  solved.  The  user  has  to  provide  nuinerical  ’ul  les  for  the.se  parameters,  which  are 
contained  in  three  different  input  files  simul.data,  esti.n.data  and  algo. data.  Here 
again,  these  files  can  be  created  and  modified  with  any  text  editor.  Under  X  Window, 
the  user  interface  allows  to  create  and  modify  the  files  simul.data,  estim.data  and 
algo. data  automatically. 

When  this  second  stage  is  completed,  the  Fortran  files  are  executed  and  the  results  are 
stored  in  the  output  files  zpb. state,  zpb.estim  and  zpb. density,  which  contain  the 
simulated  state  sequence,  the  estimated  state  sequence  (usually  the  conditional  mean), 
and  the  conditional  density  sequence. 

Under  X  Window,  graphical  tools  are  provided,  which  allow  to  visualize  the  data  in 
the  output  files,  in  the  following  way 

•  a  representation  of  both  the  simulated  state  sequence  and  the  estimated  state  se 
quence  vs.  time,  is  provided  in  a  first  window,  as  well  as  a  shaded  area  representing 
some  confidence  region  for  the  r.tnditional  distribution  at  a  given  level, 

•  a  time  can  be  selected  in  the  first  window,  and  a  representation  of  the  conditional 
density  at  the  selected  time  is  then  provided  in  a  second  window, 
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•  it  is  also  possible  to  visualize  in  the  second  window,  the  continuous  time  evolution 
of  the  conditional  density. 

In  earlier  versions  of  the  software,  Fortran  programs  were  provided  for  the  graphic 
visualisation  of  the  results,  based  on  the  Graphical  Kernel  System  GKS  [3,7]  library. 

The  organization  of  a  ZPB  session  is  summarized  by  the  following  flowchart. 
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algo.data  estim 


Fortran 

program 


apb.state 

zpb.esiim 

zpb.densiiy 


f 

□ 

vjjua/irazron 

X  Window 
interface 

GKS 

program 
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E  Example 


Consider  the  nonlinear  filtering  problem  for  the  following  quadratic  sensor  system 
f  dXi  =  -PX,dt-^dWt  , 

\  dY,  =  (Xt^-eX^)diJrdV,, 

where  {W, ,  <  >  0}  and  {VJ ,  <  >  0}  are  independent  Wiener  processes  with  variance  Q 
and  R  respectively.  The  initial  condition  Xq  is  a  Gaussian  random  variable  with  mean 
/i  and  variance  E.  Note  that  when  e  =  0,  the  observation  function  is  linear  and  the 
conditional  law  is  Gaussian,  whereas  when  e  0,  the  observation  function  is  symmetric 
around  xo  =  — l/2e,  and  is  not  injective,  which  can  result  in  a  multi-modal  conditional 
density  when  the  signal  is  in  the  neighbourhood  ci  xq. 

This  model  is  described  by  a  sequence  of  Maple  instructions  in  the  file  zpb.data.  The 
corresponding  Fortran  files  are  automaticaly  generated  and  compiled,  as  well  as  the  file 
.model  containing  the  list  of  model  parameters,  see  Figure  1. 

Two  numerical  examples  are  considered  below,  for  different  values  of  the  parameter  c. 
Numerical  values  of  the  model  parameters  are  given  in  the  following  table 


0 

0.2 

£ 

0.0  or  0.25 

0,0 

y 

1.0 

Q 

1.0 

R 

0.001 

In  both  Figure  2  and  3,  the  level  of  the  shaded  confidence  region  is  0.95 
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observation_Dode:>'continuous‘ ; 

noise_driven_state ; =true ; 

dim_state:”l; 

initial:  =  'gaussi2Ln‘ ; 

drift :=-beta*x; 

dim.noise ;“1 ; 

diffusion:=l: 

dim.obs;*!; 

observation:=x+eps*x"2 ; 
problem:* 'filtering' ; 


beta 

eps 

xO 

qO 

qq 

rr 


Figure  1:  Description  file  zpb.data  and  corresponding  parameter  file  .model 
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0  2  4  6 


Figure  3:  Non-injective  observation  function  (e  =  0.25) 
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Abstract  Some  computable  approximate  expressions 
are  provided  for  the  conditional  law  of  diffusion  processes 
observed  tn  continuous  time.  The  numerical  schemes  are 
derived  through  an  approximation  of  the  origincU  filter¬ 
ing  problem.  Given  a  partition  of  the  time  interval,  tl^is 
procedure  consists  in  sampling  the  available  observation 
sample  path  and  approximating  the  a  priori  law  of  the 
diffusion  process.  This  results  tn  approximation  schemes 
for  the  Zakai  equation,  for  which  rate  of  convergence  are 
provided. 

1  Introduction 

The  purpose  of  this  paper  is  to  give  computable  and 
accurate  approximate  expressions  for  the  conditional  law 
of  a  diffusion  process  observed  in  continuous  time.  Since 
this  conditional  law  depends  on  both 

■  the  a  priori  information,  provided  by  the  semi-group 
{Pt .  t  >  0}  or  equivalently  the  infinitesimal  genera¬ 
tor  L, 

■  the  available  observation  sample-path  {Vt  i  f  2  0}- 

the  approximation  problem  under  consideration  should 
reduce  in  some  sense  to 

■  approximate  the  a  priori  law  of  the  original  diffusion 
process,  e.g.  by  the  more  simple  a  priori  law  of  some 
other  process, 


2  The  filtering  problem 

On  a  measurable  space  (O,  T)  are  given  a  probability 
measure  P,  and  a  pair  of  stochastic  processes  {Xt ,  t  >  0} 
and  {  ,  t  >  0}  taking  values  in  R"  and  respectively, 

such  that  under  P 

dY,  =  h(X,)dt  +  dV,  .  (1) 

where  {V, ,  (  >  0}  is  a  standard  Wiener  process,  inde¬ 
pendent  of  {Xt ,  t  >  0}. 

Note  that  the  a  priori  law  of  the  signal  {Xt .  t  >  0} 
is  not  specified  at  this  point.  The  observation  function 
satisfy  the  following  hypothesis 

h  it  a  meaturable  and  bounded  function  from 
R™  to  R"*. 

Reniark  2.1  As  usual,  (1)  is  the  mathematical  way  of 
expressing  that  some  measurement 

=  +  ,  (2) 

is  available  at  time  t,  where  {r/t ,  t  >  0}  is  a  Gaussian 
white-noise  process,  independent  of  {Xt .  t  >  0}. 

Introduce  the  7-algebras 

P,  =  7(X, .  0  <  f  <  t)  , 

y,*  =  7(n-y.,,<T<f),  y,  =  y?. 


■  extract  the  most  useful  information  from  the  av^- 
able  continuous  time  measurements  {yi  ■  t  >  0}. 

The  general  situation  of  filtering  a  signal  process  from 
noisy  continuous  measurements  will  be  considered.  At 
each  step  of  the  approximation  procedure,  the  general 
formulas  vrill  be  applied  to  the  particular  case  of  diffu¬ 
sion  processes,  in  order  to  check  whether  or  not  some 
computable  expression  has  been  obtained.  Note  that  only 
time  discretization  is  considered  here:  the  discretization 
with  respect  to  the  space  variable,  e.g.  the  approximation 
of  the  partial  differential  operator  L  by  finite  differences 
is  not  taken  into  consideration. 

*Re*eartIi  portuily  lupported  by  Syttenw  Retevch  Center,  Uni- 
▼enity  td  M*r7lnn<l,  and  by  USACCE  under  Contract  DAJA4^ 
t7>ll~029e 


Tbe  problem  U  to  estimate  Xt  from  i.e.  to  compute 
the  coDditio&al  (a  posteriori)  law  of  Xf  given  yt^  defined 
by  E(0(X,)  I  yt).  Introducing 

z;  =  exp  h'{Xr)dYr  -  ^  J*  |fc(X,)|’  dr  J  , 

and  Zt  =  ,  it  is  standard  that,  for  all  T  >  0  the  orig¬ 

inal  probability  measure  P  is  equivalent  on  [0,T]  to  the 
reference  probabUity  measure  R*  with  Radon-Nikodym 
derivative  Zj,  so  that  under  Pt  {y, ,  t  >  0}  is  a  stan¬ 
dard  Wiener  process,  independent  of  {Xt ,  t  >  0}. 

By  the  Bayes  formula 


E(0(X.)  I  y.)  = 


E«(»(X.)X.  |y,) 
Et(X.  I  y.) 


80  that  it  18  enough  to  compute  {pt  >  ^  ^  <iefine<l  by 

(p,,^)  ^  Et(0(jf.)z,  I  y.) . 

In  the  particular  case  where  the  signal  ,  <  >  0}  is  a 
diffusion  process  with  infinitesimal  generator  X.  {pt ,  t  > 
0}  is  the  unique  solution  of  the  Zakai  equation 

dpt  =  X’ptdi  +  Vp,dyt  .  (3) 

It  is  readily  seen  on  this  equation  that  pt  depends  on  the 
a  priori  law  of  {Xt ,  t  >  0}  represented  by  the  partial  dif¬ 
ferential  operator  X,  and  on  the  observation  sample-path 
{Yt,  t  >  0}.  However,  equation  (3)  is  not  computable 
and  should  be  approximated.  The  approach  presented 
here,  is  to  rather  approximate  the  original  filtering  prob¬ 
lem  by  a  simpler  problem,  and  to  consider  the  resulting 
equation  for  the  conditional  law  in  this  new  filtering  prob¬ 
lem  as  an  approximation  to  equation  (3).  In  Section  5, 
the  rate  of  convergence  for  such  approximations  will  be 
provided,  by  direct  numerical  analysis  of  equation  (3). 

The  presentation  adopted  follows  Korezlioglu-Maz- 
ziotto  [2].  There  is  indeed  three  successive  steps  in  the 
global  approximation  procedure.  In  the  first  step,  sam¬ 
pling  and  data  compression  of  the  observation  sample- 
path  {Vt ,  f  >  0}  is  performed.  Then,  thf  signal  {Xt ,  /  > 
O^is  approximated  by  some  p^  .n~  .  constant  process 
{Xt  T  ^  ^  0}.  In  the  last  step.  M  a  priori  law  of  the 
process  {Xt ,  t  >  0}  is  ap*  yy  .ted.  Only  the  first  two 
steps  will  be  considered  here. 


3  Sampling  of  the  observation 
sample-path 

Throughout  the  paper,  an  infinite  partition 
0  =  to  <  <  ■  "  <  fn  <  • 

of  [0, -hoc)  is  introduced,  to  be  denoted  by  x,  with  time 
increments  6i  =  —  fj. 

Sampling  a**  i  data  compression  is  the  pre-processing 
procedure  by  which  the  new  information  contained  in 
the  continuous  measurements  received  in  the  time  inter¬ 
val  t  <  ti+i  *nd  reprewnted  by  ,  i«  (umm&rized 
into  a  finite  number  of  random  variables.  This  is  formal¬ 
ized  in  the  following 

Definition  3.1  An  admissible  sampling  procedure  rel¬ 
ative  to  the  partition  t  is  a  family  >  0}  «/ 

a -algebras  which  satisfy,  for  all  i  >  0 

fij  3^1^,  is  generated  by  a  finite  number  of 
random  variables, 

In  addition,  the  following  noiationA  are  used 


m-l 


Here  are  two  examples  of  admissible  sampling  proce¬ 
dures,  to  be  considered  throughout  the  paper. 

Example  1.  Define 

A 

-  y*.  =  z.ds,  (4) 

which  is  the  mean  value  of  the  actual  measurements  (2) 
on  the  time  interval  <  s  <  In  this  example,  5^)^, 
is  generated  by  the  random  variable  Note  that,  under 
the  reference  probability  measure  {(i « t  >  0}  are 
mutually  independent  d-dimensional  Gaussian  random 
variables  with  zero  mean  and  covariance  matrix  Sil. 


Example  2.  Define 


jf  (s-U)dY.  =  jJ^^  J  yrdrds, 

1  /■'•+*  1  r'-*'  r‘ 

j:  J  (ti+i  - ‘)dY,  =  —  jf  y^drds. 


which  are  two  other  different  ways  of  computing  some 
mean  value  of  the  actual  measurements  (2)  on  the  time 
interval  t.  <  s  <  fj+i.  In  this  example.  is  gen¬ 

erated  by  the  random  variables  {{  and  fj.  Note  that 
and  that,  under  the  reference  probability  mea¬ 
sure  P’.  t  >  0}  are  mutually  independent  2d- 

dimensional  Gaussian  random  variables  with  zero  mean 
and  covariance  matrix  where 


In  particular,  the  characteristic  function  of  satis- 

6es 


x(o.6)  =  E’(exp|a‘{J-(-6‘{f}) 

=  exp{i(|o|^ -ho’6-(- |6|’)ii}  . 


(5) 


The  problem  is  now  to  estimate  Xt,  from  yt.*  i-c.  to 
compute  the  conditional  law  of  Xt,  given  ye,-  By  the 
Bayes  formula 

■  •  EU^t,  |y..) 

BO  that  it  is  enough  to  compute  {p. ,  t  >  0}  defined  by 

(pi,^)  =  E*(^(A-,,)Z,,  |y..)  . 

The  first  step  is  provided  by  the  following 
Proposition  3.3  Introduce 

=Et(z.‘.v. 

Then 


(6) 


Proof. 

(Pi+l.^) 

=  iy.„j 

=  En^(jf..*.)2^Hi:^.  ly,.^.) 

=  |.F.,vy.,  v5^;^,)|y,..,.) 

=  E^[t^i+ifl»]^..  ly...,,) .  n 

Going  back  to  the  examples  introduced  above,  the  ex¬ 
pression  for  EJi^j  will  be  derived,  and  it  will  be  checked 
whether  or  not  the  additional  hypothesis  that  the  signal 
{>^t ,  t  >  0}  is  a  diffusion  process  can  lead  to  computable 
expressions. 

Example  1  (Continued).  For  the  sampling  procedure 
defined  by  (i,  it  U  proved  in  [2]  that 

,  (7) 

where 

lK^jj'"'h(X.)ds  . 


4  Piecewise  constant  approxima¬ 
tion  of  the  signal  process 

The  purpose  of  this  section  is  to  investigate  the  effect 
of  replacing  the  signal  process  {Xt ,  t  >  0}  by  a  piecewise 
constant  process  {Xt « t  >  0}  whose  values  on  '^pieces"" 
are  related  in  some  way  to  the  values  taken  by  the  orig¬ 
inal  signal  process  at  some  particular  instants.  This  is 
formalized  in  the  following 

Defizution  4.1  A  proceas  {7t « t  ^  0}  is  subordinate  to 
the  process  {Xt « i  >  0}  rtlativeiy  io  the  partition  ir  if, 
for  alii  >0 

Xf  is  ^t,+i  -meaaurahU,  U  <t  < 

The  following  example  provide  a  particular  class  of 
subordinate  process,  to  be  used  throughout  the  paper. 

Example.  For  all  i  >  0  are  given 

•  a  partition  {Aj ,  1  <  j  <  it(i)}  of  the  time  interval 

•  an  increasing  sequence 

U  <  r'  <  ■■■  <  t(  <  ■■■  <  T*'**  <  f,+i  . 

Then  the  piecewise  constant  process  {Xt .  t  >  0}  defined 
by  _ 

Xt  =  x^  ,  ate  Al  . 


However,  replacing  this  expression  into  (6)  does  not  pro¬ 
vide  a  computable  expression,  even  if  the  additional  hy¬ 
pothesis  that  the  signal  {Xt ,  t  >  0}  is  a  diffusion  process 
is  introduced. 


Example  3  (Continued).  For  the  sampling  procedure 
defined  by  it  can  be  proved  that 

HJ:..  =exp{[ftlr«!  +  WNJ 

=  exp 

•exp  {[h!]-{.‘  -  •  exp 


where 


®( 


t  —  ti 


ti+i  -  ti 

1 


\h(X.)d,, 


and  the  weight  function  w  is  defined  for  all  0  <  $  <  1  by 

!i;(e)  =  W  -  2. 

Here  again,  replacing  this  expression  into  (6)  does  not 
provide  a  computable  expression,  even  if  the  additional 
hypothesis  that  the  signal  {Xt ,  t  >  0}  is  a  diffusion  pro¬ 
cess  is  introduced. 


is  subordinate  to  {Xt .  t  >  0}  relatively  to  the  partition 
X.  There  is  a  similar  class  of  subordinate  processes,  where 
the  time  interval  to  be  partitioned  is  rather  . 


The  problem  is  to  chose  {Xt .  t  >_0}  in  such  a  way 
that  the  conditional  law  of  7t,  given  ^t,  i>  more  simple 
to  handle  than  the  conditional  law  of  Xt,  given  ^t, .  and 
is  even  computable  in  the  particular  case  where  the  signal 
{-Itt ,  t  >  0)  is  a  diffusion  process. 

Introduce 

x  =  exp  h-(Xr)dYr  -  i  jf'  |h(X.)|=  dT|  (9) 


and  Zt  =  Zf  Under  the  reference  probability  measure 
Pl,  the  processes  {JCt ,  f  >  0}  and  {Vt ,  t  >  0}  are  in¬ 
dependent.  so  that  the  stochastic  integral  in  (9)  is  well 
defined,  although  {Xt ,  t  >  0}  is  not  necessarily  adapted. 
Therefore,  it  is  possible  for  all  T  >  0  to  define  a  new 
probability  measure  7  equivalent  on  [0,7]  to  with 
Radon-Nikodym  derivative  7t,  so  that  under  P 

dYt  =  h(X,)dt  +  JV,  , 


where  {7t .  t  >  0}  is  a  standard  Wiener  process,  inde¬ 
pendent  of  {Xt ,  t  >  0}.  _ 

The  problem  is  now  to  estimate  7t,  from  Yt,,  i.e.  to 
compute  the  conditional  law  of  7t,  given  ^t,.  By  the 
Bayes  formula 


80  that  it  U  enough  to  compute  {Pi ,  i  >  0}  defined  by 


Then  h,  =  h{Xt,^^)  and  ■  Therefore 


F.+i*  =  )  I  V  ) . 


It  follows  from  the  proof  of  Proposition  3.2  that 

1  ,  (10) 

where,  for  all  i  >  0 

Going  back  to  the  examples  introduced  above,  some 
particular  piecewise  constant  subordinate  processes  will 
be  considered,  the  corresponding  expression  for 
be  derived,  and  it  will  be  checked  whether  or  not  the 
additional  hypothesis  that  the  signal  {Xt ,  t  >  0}  is  a 
diffusion  process  can  lead  to  computable  expressions. 

Example  1  (Continued).  For  the  sampling  procedure 
defined  by  has  the  same  form  than  (7)  where 

now  ^ 

('*'h(X.)dB. 

^  Jt. 

Two  different  piecewise  constant  subordinate  processes 
will  be  considered. 

I  la  I  Define 

X,  =  Xt,  .  \iti<t<  tj+i  . 

Then  hi  =  h{Xt,)  and  =  'ii(Xt,)  ,  where  for  all 
I  €  R"* 

**(i)^exp{fc-(i)<.-i|h(i)P<i.}  .  (11) 

Therefore 

Fi+,^  =  4'i(A-,,)E'(0(X,.,,)lJ^..)  . 

Under  the  additional  hypothesis  that  the  signal  {Xt .  (  > 
0}  is  a  diffusion  process  with  semi-group  {Pi ,  f  >  0} 

Ft..,^  =  *i(X,.)[Ps.^](X,.)  . 


and 


(p.+i.«)  =  E»(*i(X,.)[Ps.0l(X,,)2..  ly....) 

=  (Pi.'ftl'Ps.^])  1 

so  that  {pt  1 1  >  0}  satisfies  the  following  recurrence 

P.+1  =  ^’/.[♦*P.]  .  (12) 

which  is  a  computable  expression,  and  can  be  considered 
as  a  time  discretization  scheme  for  the  Zakai  equation 
(3).  The  rate  of  convergence  of  this  approximation  will 
be  considered  in  Section  5. 

I  lb  I  Define 

Xt  =  Xt,.j  *  if  fi  <  f  ^  f.+i  • 


Under  the  additional  hypothesis  that  the  signal  {Xt ,  t  > 
0)  is  a  diffusion  process  with  semi-group  {Pt .  f  >  0} 

I^i+i(t  =  Ps,[’I'j^l(Xt.)  , 

and 

(p.^i,^)  =  EHP,.l*.,/,](Xt.)Zt.  |y..,.) 

=  (P..-P5.[*i^])  • 

This  results  in  the  following  recurrence 

Pi+i  =  ♦i[^V.Pil  .  (13) 

which  gives  another  time  discretization  scheme  for  the 
Zakai  equation  (3). 

Remark  4.2  In  the  numerical  scheme  (12)  (resp.  (13)) 
the  transition  from  p^  to  p^^|  reflects  the  following  sit¬ 
uation:  A  new  measurement  is  available,  which  is  a 
compression  of  the  information  provided  by  {^t  •  ti  <  t  < 
according  to  (4).  This  measurement  is  interpreted 
as  a  noisy  nonlinear  observation  of  Xt,  (resp.  )• 
is  combined  with  the  current  estimate  p^  of  Xf,  to  pro¬ 
duce  an  estimate  pj^j  of 


Example  2  (Continued).  For  the  sampling  procedure 
defined  by  (^{.^J),  has  the  same  form  than  (8) 

where  now 


‘•4r 

^  1. 


S?(--^ — ^l-)A(X.)ds  . 

t»-n  - 

u7(  *‘^‘'f  )MX.)d. . 


The  following  family,  parametrized  by  0  <  a  <  of 
piecewise  constant  subordinate  processes  will  be  consid¬ 
ered 


Xt. 

if  t  e  x“ 

X,.„ 

ift€[f.,t.+,)\A“ 

where  for  all  i  >  0.  Af  denotes  the  following  subset  of 
the  time  interval 


(i  +  a^i 


t.+i  -  a6, 


t 


It  is  theu  possible  to  find  a  particular  value  ao  for 
which 

h\  =  h{Xt.,t)-  h\  =  h{Xt,). 

Therefore  (8)  becomes 


h|:.,  =  ♦!{X,.,.)*J(X,,) 

,-xp{X\h{X,„,)-h{Xt,tS,} 


where  for  all  x  € 

*!(i)  iexp{A‘(a:)fj  -  , 

*!(x)iexp{V(x){t-i|Mi)|’4.}  . 

and 

•«P  -  MJftJl’ii}  I  V . 

Under  the  additional  hypothesis  that  the  signal  {X( ,  t  > 
0}  is  a  diffusion  process  with  semi-group  {Pt ,  f  >  0} 

Ui^i^  = 

and 

(P.4.1.'^)  =  En4';(A-,.)Q6.[*!0](A-..)r,.  |y..^.) 
so  that  {p^  <  t  >  0}  satisfies  the  following  recurrence 

P.+, = ♦{Qj, (♦;?.!  •  (14) 

where  the  family  of  operators  {Q^ ,  ^  >  0}  is  defined  by 


Remark  5.1  Similar  rate  of  convergence  has  already 
been  obtained  for  approximation  of  nonlinear  filtering 
problems,  in  Picard  [6j  and  Newton  [4].  The  proof  in  [6] 
uses  only  probabilistic  arguments  and  does  not  consider 
the  Zakai  equation,  but  rather  the  underlying  nonlinear 
filtering  problem.  In  [4],  the  Zakai  equation  is  consid¬ 
ered  for  pure-jump  Markov  processes  rather  than  diffu¬ 
sion  processes,  and  the  approximation  procedure  relies 
on  the  stochastic  Taylor  formula  of  Wagner-Platen  [7,8]. 

Define,  for  all  x  €  R”* 

*;(x)  =  exp{h-(x)(y,  -  y.)  -  i|h(x)p  (t  - »)}  . 

Note  that  two  operators  are  involved  in  (16) 

‘  the  unbounded  operator  L*  which  generates  the  ad¬ 
joint  semi-group  {Pj* .  f  >  0}  , 

*  the  multiplication  operator  B  which  generates  the 
two-parameter  stochastic  semi-group  {^t  •  ^  ^  ^  ^ 

t}. 

so  that  the  time  discretization  scheme  (15)  is  a  Trotter¬ 
like  product  formula  for  the  Zakai  equation  (16).  See 
Bensoussan-Glowinski-Rascanu  [1]  for  a  related  work  in 
this  direction. 


Qt4,  i  Et(^(j:,+<)exp  -  h{X,)\H}  I  J^.)  . 

Note  that  ♦|(i)  4'*(i)  =  ,  aod  that  the  opera¬ 

tor  Qt,  can  be  seen  as  a  perturbation  of  the  semi-group 
Pl^.  However,  it  is  not  obvious  that  (14)  is  a  computable 
expression  and  can  be  considered  as  a  time  discretization 
of  the  Zakai  equation  (3).  The  relevant  analysis  and  the 
rate  of  convergence  of  this  approximation  will  be  consid¬ 
ered  elsewhere. 


5  A  product  formula  and  its  rate 
of  convergence 

The  purpose  of  this  section  is  to  study,  from  the  point 
of  view  of  numerical  analysis,  the  following  recurrence 

P.+  l  =  .  (15) 

derived  in  the  previous  section,  as  a  time  discretization 
scheme  for  the  Zakai  equation 

dpt  —  L*ptdt  +  h'ptdYt  .  (16) 

Recall  that 

(p,.,^)  =  Ei(0(X.,)z,.  |y,.) . 

(p.,^)  =  E»{0(X,.)7,,  |3?<.). 

so  that  p^  should  be  **clo8e**  to  pt,-  Indeed  it  will  be 
proved  below  that 


The  main  assumption  of  this  section  is  that  the  signal 
f  >  0}  is  a  diffusion  process 

dXt  =  b{Xt )  dt  +  a{Xt)dWt  ,  Xo  ~  po(x)  dx 
observed  in  continuous  time  through  measurements 
dYt^h{Xt)dt-\‘dVt  . 


Define  a  =  ct(t“  and  a*  =  ^  — — .  The  coefficients  sat- 
isfy  the  following  hypotheses 


(i)  po  w  o  density  on  R”*, 

(ii)  a  is  0  continuous  ond  hounded /unction  on 
R”*  ond  o  is  o  uniformly  elliptic  m  x  m 
matrix,  i.e.  o(x)  >  qI  , 

(Hi)  b  and  a  are  hounded  ond  meosurahte /unc¬ 
tions  from  R”*  to  R™  , 

(iv)  h  is  a  measurable  and  bounded  function 
from  R”*  to  R*^  . 


The  infinitesimal  generator  of  the  semi-group  t  > 
0}  is  defined  by 


a’ 


*.;=i 


dxidxj 


d 

dzi 


and  satishes.  under  the  hypothesea,  the  following  coer- 
civity  property:  for  all  u  €  i?‘(R'") 

2(Lu,n)  +  <  Alul’  ,  (17) 


{E’|P.-P..|’}’^'<C«  , 

where  S  is  the  mesh  of  the  partition  x  up  to  time  t,.  and 
I '  I  denotes  the  norm  in  the  Sobolev  space  L^(R”*). 


where  |1  ■  {|  denotes  the  norm  in  the  Sobolev  space 
H'  (R™).  Existence  and  uniqueness  of  a  solution  to  equa¬ 
tion  (16)  is  proved  in  Pardoux  [5]  and  Krylov-Rozovskii 
[3], 


Therefore 


Theorem  5.2  Suppose  that,  in  addition  to  (i)-(iv) 

(v)  a,  6  and  a  have  hounded  first  derivative, 

(vi)  h  has  bounded  derivatives  up  to  order  2  . 

Then,  ifpo  € 

max.  {E’K  <  CS  .  (18) 


Proof.  Under  the  hypotheses,  it  follows  from  Theo* 
rem  2.1  of  [5]  that  p  €  C((0,r] ;  H^R'"))).  Also, 

for  all  1  >  0  ,  pj  6  ir^(R"*))  and  in  addition 

max  E^lipfclP  <  C  . 

For  t  >  tk,  define  Vt  =  i  so  that  p^  =  Vt* 

and  Pjj^i  =  Differentiating  with  resp  '•t  to  t  gives 

dvt  =  dVt 

=  rvtdt  +  iBptydyt  +  /3^dyt , 

where  the  perturbation  term  is  defined  by 

Note  that  /3  6  l^(Q:  C([tfc.r] :  if^(R"‘))).  The  identity 
of  energy  of  [5]  applied  to  the  difference  e  =  v  —  p.  and 
the  coercivity  property  (17)  give 

E*k,P<E'K]=  +  C  fz^e.fds  +  C  f\'\p.\Us. 

Jtk  Jtt 

Assume  the  following  estimate  holds 
E*|/3.|'  <C|»-Up  «p{C(*-t*)}  E*||p*f  .  (19) 
Applying  Gron wall's  lemma  and  setting  <  =  tjt+j.  yields 
E’lPk+i 

<  [E’lp*-p,J^  +  C|tt+i -t*|’)  exp{C(tk+i -tk)}  . 

and  the  rate  of  convergence  (18)  follows  from  the  discrete 
Gronwall  lemma.  The  end  of  the  proof  is  devoted  to 
proving  estimate  (19). 

First,  the  following  perturbation  result  bolds 

Jti, 

provided  u  is  smooth  enough.  Under  the  hypotheses, 
[L'B  —  BL‘]  is  a  bounded  operator  from  ff'fR”’)  to 
L^(R"‘)  ,  so  that  it  is  enough  that  u  6  £r'(R”*)  for  (20) 
to  hold.  Now.  ['t|*Pkl  €  fl^’(R"*)  a.s.  so  that 

=  fp,'-.  [L'B  -  BL-]  p;.,,  [♦|‘Pk]* 

Jtk 


\Pt?  <c\t-t„\^  ll«:‘Pkll^ 

and 

E»|/3.|^  <  C|t-tkpEt|H-J‘pkll’ 

<  C|t-tk|"  exp{C(t-fk)}  Etlpkll"  , 

which  proves  (19).  □ 

Remark  6.3  The  same  rate  of  convergence  holds  for  the 
approximation  scheme  (13). 

The  next  step  is  to  approximate  the  adjoint  semi¬ 
group  (Pj*.  f  >  0}  itself,  i.e.  to  approximate  the  asso¬ 
ciated  Fokker-Planck  equation.  For  instance,  using  an 
implicit  Euler  scheme  results  in  the  following  approxi¬ 
mation  scheme 
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A  time  discretization  scheme  is  provided  for  the  Zakai  eqaation,  a  stochastic  PDE  which  gives  the 
conditional  density  of  a  diffusion  process  observed  in  white-noise.  The  case  where  the  observation  noise 
and  the  state  noise  are  correlated,  is  considered.  The  numerical  scheme  is  based  on  a  Trotter-like 
product  formula,  whi'~h  exhibits  prediction  and  correction  steps,  and  for  which  an  error  estimate  of 
order  S  is  proved,  where  S  is  the  time  discretization  step.  The  correction  step  is  associated  with  a 
degenerate  second-order  stochastic  PDE,  for  which  a  representation  result  in  terms  of  stochastic 
characteristics  has  been  proved  by  Krylov-Rozovskii  [13]  and  Kunita  [15,17].  A  discretization  scheme 
is  then  provided  to  approximate  these  stochastic  characteristics.  Under  the  additional  assumption  that 
the  correlation  coefTicient  is  constant,  an  error  estimate  of  order  ,/S  is  proved  for  the  overall  numerical 
scheme.  This  has  been  proved  to  be  the  best  possible  error  estimate  by  Elliott-Glowinski  [7]. 

KEY  WORDS:  DilTusion  processes,  correlated  noises,  nonlinear  filtering,  Zakai  equation,  stochastic 
PDE,  stochastic  characteristics,  time  discretization. 

1.  INTRODUCTION 

The  purpose  of  this  paper  is  to  present  a  computable  time  discretization  scheme  for 
the  Zakai  equation  of  nonlinear  filtering  with  correlated  noises,  and  to  provide  an 
estimate  of  the  rate  of  convergence. 

In  the  case  of  independent  noises,  the  problem  has  been  studied  by  Kushner 
[18],  Newton  [21],  Korezlioglu-Mazziotto  [11],  Bennaton  [1],  DiMasi-Pratelli- 
Runggaldier  [6],  Picard  [22],  Bensoussan-Glowinski-Rascanu  [2]  and  Le  Gland 
[20],  Some  of  these  authors  have  actually  considered  the  associated  Zakai 
equation.  Time  discretization  schemes  have  been  provided  with  a  rate  of  conver¬ 
gence  of  order  S,  where  d  is  the  time  discretization  step. 

In  the  case  of  correlated  noises,  the  problem  has  been  studied  by  Elliott- 
Glowinski  [7],  The  best  approximation  of  the  continuous  filter  based  on  the 


•Research  partially  supported  by  USACCE  under  Contract  DAJA45-90-C-0008. 

tAlso:  INRIA  Lorraine,  CESCOM,  Technopole  de  Metz  2000,  4  rue  Marconi,  F-57070  Metz,  France. 


233 


234 


P.  FLORCHINGER  AND  F.  LE  GLAND 


values  of  the  observation  process  at  a  regular  partition  (with  mesh  S)  has  been 
considered,  and  it  has  been  proved  that  the  rate  of  convergence  is  of  order  y/l. 
However,  no  algorithm  is  provided  to  actually  compute  this  approximation. 

The  paper  is  organized  as  follows.  In  Section  2,  the  nonlinear  filtering  problem 
is  presented.  Some  results  on  the  Zakai  equation,  and  on  a  related  degenerate 
second-order  stochastic  PDE,  are  recalled  in  Section  3.  A  Trotter-like  product 
formula  is  then  considered,  with  an  error  estimate  of  order  S.  However,  this 
numerical  scheme  is  not  computable.  In  Section  4,  a  representation  result  in  terms 
of  stochastic  characteristics  is  presented  for  the  degenerate  second-order  stochastic 
PDE.  This  part  follows  mainly  the  work  of  Krylov-Rozovskii  [13] — see  also 
Kunita  [15,17].  A  time  discretization  scheme  is  presented  in  Section  5,  based  on 
an  approximation  of  the  stochastic  characteristics.  Under  the  additional  assump¬ 
tion  that  the  correlation  coefficient  is  constant,  an  error  estimate  of  order  y/s  can 
be  proved.  In  addition,  this  numerical  scheme  is  actually  computable,  as  far  as  time 
discretization  is  concerned,  i.e.  up  to  space  discretization. 


2.  THE  FILTERING  PROBLEM 

Consider  the  following  stochastic  differential  system,  defined  on  the  probability 
space  (n,.^,P) 


dX,  =  b(X,)  dt + a{X,)  dW,+ p(X,)  d  V, 
dY,  =  h(X,)dt+dV, 

where  the  non  observed  component  takes  values  in  R",  and  the 

observation  {T„t^0}  takes  values  in  R"*.  {H',t^0}  and  {K„t^0}  are  independent 
Wiener  processes  of  appropriate  dimension,  with  covariance  matrix  /  (identity)  and 
r  respectively.  For  the  clarity  of  exposition,  it  is  assumed  throughout  the  paper 
that  r  =  I.  In  addition,  the  random  variable  A'o  is  independent  of  the  Wiener 
processes,  with  probability  distribution  po(x)dx. 

Throughout  the  paper,  it  is  assumed  that  the  coefficients,  b,  a,  p  and  h  are 
globally  Lipschitz  continuous  functions  defined  on  R",  so  that  the  stochastic 
differential  system  has  a  unique  strong  solution.  The  following  definitions  are  used; 
a^aa*  and  c^pp*.  In  particular,  it  is  not  assumed  that  either  a  or  c  is  uniformly 
elliptic. 

With  the  diffusion  process  {XpigO}  are  associated  the  two  partial  differential 
operators 


dxidxj  (“i  oxi 

I  I  b'f. 

2 dxidxj  dXi 
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Another  family  of  partial  differential  operators  to  be  considered  is 

iti  SXi 


Introducing 


4cxp  |} h*{X,)dY,  -  if  itj. 


z.^z 


0 
t  • 


it  is  standard  that,  for  all  T>0  the  original  probability  measure  P  is  equivalent  on 
[0,  T]  to  the  reference  probability  measure  Pf  with  Radon-Nikodym  derivative  Zj-, 
so  that  under  Pf 


dX,  =  b(X,)dt  +  a(X.)  dW.  +  p{X,)  IdY,  -  k{X,)  dtl  (2.1) 

where  and  are  independent  Wiener  processes,  with  covariance 

matrix  I  (identity),  and  the  random  variable  Xq  is  independent  of  the  Wiener 
processes,  with  probability  distribution  po(x)dx. 

The  Bayes  formula  gives 

E(/wi^,)=gKwy>. 


and  in  addition 


Enf(X,)Z,l^.)^jf{x)p,(x)dx, 

where  the  unnormalized  conditional  density  satisfies  the  Zakai  equation 

[25] 


dp,  =  L*p,dt+  X  Btp.dYl 


(2.2) 


Consider  then  the  following  decomposition  of  the  Zakai  equation  (2.2) 
dp,^Lip,dt+A*p,dt+  X  Bfp.dyf, 

k*i 


AAL-Lo=^  £ 

Bxidxj 


where 
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On  one  hand,  the  partial  differential  operator  Lq  generates  a  strongly  continuous 
semigroup  On  the  other  hand,  it  is  possible  to  associate  a  stochastic 

semigroup  {Qf,0gsgt}  with  the  following  degenerate  second-order  stochastic 
PDE 


dp,  =  A*p.dt+  i  Btp.dYl 


(2.3) 


which  is  studied  below.  Therefore,  it  is  worth  studying  the  following  Trotter-like 
product  formulas  for  approximating  the  original  Zakai  equation  (2.2) 


Pi*i  =  Q',U,PlPi> 


(2.4) 


where  tj,  and  0  =  to<fi<  -<ti<  — 

The  main  interest  of  such  product  formulas  is  that  the  original  equation  has  been 
split  into  a  second-order  deterministic  PDE  (prediction  step),  and  a  degenerate 
second-order  stochastic  PDE  (correction  step).  In  the  case  of  independent  noises, 
this  stochastic  PDE  reduces  to  a  zero-order  equation,  for  which  there  exists  a 
straightforward  explicit  solution.  In  the  case  of  correlated  noises,  a  representation 
result  is  available  by  the  method  of  stochastic  characteristics  (i.e.  involving  the 
stochastic  flow  of  diffeomorphism  associated  with  a  SDE  driven  by  the  obser¬ 
vation  process),  see  Krylov-Rozovskii  [13]  and  Section  4  below. 

Remark  2.1  A  similar  prediction-correction  numerical  scheme  was  obtained  by 
Kushner  [18]  in  the  case  of  independent  noises. 

Remark  2.2  Written  in  Stratonovicu  form,  equation  (2.3)  is  a  first-order 
stochastic  PDE.  For  such  an  equation,  one  can  use  the  representation  result  of 
Kunita  [15, 17],  and  translate  the  stochastic  characteristics  equations  from  Strato- 
novich  form  back  to  ltd  form,  to  recover  the  representation  result  of  [13]. 

As  a  consequence  of  the  above  discussion,  there  will  be  two  steps  in  designing 
the  approximation  to  the  original  Zakai  equation  (2.2) 

•  first  use  a  Trotter-like  product  formula, 

•  then  approximate  the  solution  of  the  degenerate  second-order  stochastic  PDE, 
by  approximating  the  stochastic  flow  of  difieomorphisms  involved  in  the 
stochastic  characteristics  method  of  [13]. 

It  will  be  proved  that  the  first  step  can  be  achieved  with  a  rate  of  convergence  of 
order  d,  whereas  the  rate  of  convergence  for  the  second  step  (and  a  fortiori  for  the 
global  approximation  procedure)  is  of  order  only,  where  d^maXjjo^i- 


3.  TROTTER-LIKE  PRODUCT  FORMULA 

For  all  n^O,  let  H”  denote  the  space  of  real-valued  Lebesgue-measurable  functions 
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on  R"  whose  generalized  derivatives  up  to  order  n  are  square-integrable,  with 
norm  H-H. 


IHI--  Z  J|i>^WPdx<oo. 

OiWSm 

In  addition,  the  following  shorthand  notations  will  be  used  throughout  the  paper 

H^iHloandlHl^lHI, 

The  beginning  of  this  section  is  devoted  to  recall  existence,  umqueness  and 
regularity  results  for  the  Zakai  equation 


dp.^L*p,dt+  I  Bfp.dyf.  (3.1) 

and  the  degenerate  second-order  stochastic  PDE 

4 

dp,  =  A*p,dt+ I  Bfp.dyf,  (3.2) 

t  =  i 

with  semigroup  {QJ.Ogsgt}. 

Although  no  coercivity  hypothesis  is  satisfied,  the  following  result  is  proved  in 
Krylov-Rozovskii  [13]. 

Theorem  3.1  Let  I  be  fixed.  Assume  that 

•  a  and  c  have  bounded  derivatives  up  to  order  max  (n,  2), 

•  b,  p  and  h  have  bounded  derivatives  up  to  order  n, 

•  the  initial  condition  satisfies  pf^eH". 

Then  both  Eqs.  (3.1)  and  (3.2)  have  a  unique  solution  peM^(0,  T;H").  In  addition 

peL^(Q;CJ[0.r];//-)), 
and  the  following  estirruite  holds 


Et  sup  IIp.II^  ^IIpoIIJ' 
.osisr 


.cr 


Similarly,  for  the  Fokker-Planck  equation 

P'.=^P„  (3.3) 

and  the  following  deterministic  PDE  associated  with  (3.2) 

p;=A*p,. 


(3.4) 
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with  semigroup  and  respectively,  it  holds 

Theorem  3.2  Let  I  he  fixed.  Assume  that 

•  a  and  c  have  bounded  derivatives  up  to  order  max(n,2), 

•  b  has  bounded  derivatives  up  to  order  n, 

•  the  initial  condition  satiates  PqeH*. 

Then  both  Eqs.  (3.3)  and  (3.4)  have  a  unique  solution  peL^(0,T;If).  In  addition 

peCJUi,T]-,H% 

and  the  following  estimate  holds 

sup  ||p,||ig(|poll^e'^’'- 
os(sr 

Remark  3.3  In  the  case  where  the  coefficients  a  and  c  arc  uniformly  elliptic,  a 
slightly  stronger  theorem  holds,  see  Krylov-Rozovskii  [12]  and  Pardoux  [23]. 

□  Error  Estimate 

The  purpose  here  is  to  study  one  of  the  Trotter-like  product  formulas  (2.4). 
Theorem  3.4  Consider  the  following  approximation  scheme 

Pi.x^^PlQ'.UA  (3.5) 


Assume  that 

•  a  c,b,  p,  and  h  have  bounded  derivatives  up  to  order  3, 

•  the  initial  condition  satires  Po  €  H^. 

Then  p,  approximates  the  solution  p,,  of  the  original  Zakai  equation  (3.1)  with  a  rate 
of  convergence  of  order  S.  Indeed 

{Et|pj-p,.n‘'^gC5||po||3. 

Proof  The  idea  is  to  get  an  equation  for  with  4>  smooth  enough, 

that  is  similar  to  the  original  Zakai  equation  for  p„  except  for  some  perturbation 
terms  which  have  to  be  estimated.  This  gives  an  estimate  of  the  one-step  error,  and 
the  global  estimate  is  obtained  using  the  Gronwall  lemma. 

Differentiating  with  respect  to  t 


dv,-Liv,dt  +  Pf. 


A*Q;<pdt+  z  B:(p,d>dY\ 

k»l 


;1 
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=  Liv,dt+A*v,dt+  £  B:v,dYi 

k-1 

k-1 

^^L*v,dt+  f  B!v.dYi  +  f.dt+  i  gidYf, 

k-1  k-1 

where  the  perturbation  terms  are  defined  by 

respectively.  The  difference  £,^v,  —  p,  satisfies 

d£.  =  L*E,dt+  X  B:e,dYi  +  f,dt+  i  gidYf. 

k-1  k-l 

Using  estimates  of  [13] 

Et|£,l^  ^[^EtieJ'  +  CEtj |/,P  dr  +  CEt^f  ]  llgt|P 
Assume  that  the  following  estimates  hold 

(3.6) 

Etllgtil"  g  C(T-s)^Et||(^|l?  (3.7) 

Then  the  Gronwall  lemma  would  yield 

Ethl^  ^  [Etle.P  +  C(t-s)*Et||</>||i] 

provided  0gL^(£J;H^).  Now,  it  follows  from  the  assumptions  and  from  Theorem 
3.1,  that  pieL^{{i;H^)  for  all  i,  so  that  setting  s=t,,  t=t,+,  and  ^=p. 

Etlp.*  1  -  P.. ,  .N  CEtlp-.-p..|*  +  C(t,.  >  -  ti)’Etllp<ll!] 

and  the  result  follows  from  the  discrete  Gronwall  lemma.  The  end  of  the  proof  is 
devoted  to  proving  estimates  (3.6)  and  (3.7). 

□  Estimate  (i.tf) 

The  following  perturbation  result 
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IPt.A*  -  =  J  P?-..[LJA*  -  A*LS]P?.^ 


holds  for  u  smooth  enough.  It  follows  from  the  assumptions,  that  the  partial 
differential  operator  D^[L5A*— A*!^]  is  bounded  from  to  In  addition,  it 
follows  from  Theorem  3.2  that  {Pf,t^0}  is  a  strongly  continuous  semigroup  in 
both  and  Therefore 


l/.l  g  i  \Pf.  ,DP?  .,Q‘4>\  dz'  g  C(T  -  s)|lQ;d>||, 

$ 

Then 

Etl/.P  g  C(i-sm||<2f^||§  C(T-s)^Etl|<^i|^^''-*'. 

□  Estimate  (3.7) 

Similarly,  the  following  perturbation  result 


[pr.A*  -  Btp:.,}u= j  pt-^ms:  -  bjlsjp?  . dz. 


holds  for  u  smooth  enough.  It  follows  from  the  assumptions,  that  the  partial 
differential  operator  — BJLS]  is  bounded  from  to  H‘.  In  addition,  it 

follows  from  Theorem  3.2  that  {P*,t^0}  is  a  strongly  continuous  semigroup  in 
both  H*  and  H^.  Therefore 


llgjll  gf  llpr...D»p?..c;.^l|dT'gc(T-s: 


Then 

Et|l«i||"  g  C(T  -s)^Et||Q;d>|ii  e‘^''-’'g  qr  -s)^Eti|</>||!  □ 

Remark  3.5  In  the  case  where  the  coefficient  a  is  uniformly  elliptic,  the  same 
error  estimate  can  be  proved  imder  weaker  regularity  assumptions  on  the 
coefficients  and  the  initial  condition,  see  Florchinger-LcGland  [8]. 

Remark  3.6  It  is  possible  to  approximate  the  stochastic  differential  equation 
(2.1),  in  such  a  way  that  the  approximation  pi  given  by  (2.4),  is  actually  the 
conditional  density  of  the  approximate  process  at  time  t„  given  the  observations 
This  problem  will  be  addressed  elsewhere. 

The  approximation  scheme  (3.5)  is  not  yet  computable.  First,  the  Fokker-Planck 
equation  (3.3)  with  semigroup  {Pr,t^0},  has  to  be  approximated:  this  is  a  rather 
standard  problem,  for  which  one  can  use  e.g.  the  backward  Euler  scheme,  or  some 
other  approximation  scheme.  On  the  other  hand,  some  representation  results  are 
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presented  in  the  next  section,  which  can  be  used  for  the  approximation  of  the 
degenerate  second-order  stochastic  PDE  (3.2),  with  semigroup  (Qf,  Ogsgl}. 


4.  STOCHASTIC  CHARACTERISTICS 

Parallel  to  the  decomposition  of  the  stochastic  PDE  (2.2),  there  is  a  similar 
decomposition  for  the  stochastic  differential  equation  (2.1).  With  the  first 
component 


dX,=b(X,)dt  +  l{X,)dW„ 

is  associated  the  partial  differential  operator  Lo  and  the  Fokker-Planck  equation 
(3.3).  It  is  proved  below  that  the  second  component 

dX.  =  piX.)ldY-h(X.)dtl  (4.1) 

is  associated  with  the  degenerate  second-order  stochastic  PDE  (3.2)  and  the 
corresponding  deterministic  PDE  (3.4). 

The  beginning  of  this  section  is  devoted  to  recall  results  concerning  the 
stochastic  flow  of  diffeomorphisms  associated  with  the  stochastic  differential 
equation  (4.1). 

Theorem  4.1  Let  ij,.,(  )  be  the  stochastic  flow  associated  with  the  forward  stochas¬ 
tic  differential  equation 


dQ,=^p{Q[dY-h{Qdtl  (4.2) 

Assume  that  the  coefficients  h  and  p  have  bouruied  derivatives  up  to  order  (n-t-l). 
Then  ij,,,(  )  is  a  C-diffeomorphism  in  R”. 

Under  the  assumption  that  the  coeflicient  p  has  bounded  derivatives  up  to  order  2, 
the  inverse  map  ^,~/(  )  is  given  explicitly  as  the  {backward)  stochastic  flow  ij,  ,{  ) 
associated  with  the  backward  stochastic  differential  equation 

dri,  =  p{ri,)®ldY,-h{ri,)  dr]  -  Po{r],)  dt,  (4.3) 

with 


I  1 

The  regularity  of  ^,,,(  )  was  first  proved  by  Blagoveschenskii-Freidlin  [3],  whereas 
the  rest  of  the  theorem  is  proved  in  Kunita  [16]. 

Proposition  4.2  The  Jacobian  J,_,{  )  {i.e.  the  determinant  of  the  Jacobian  matrix) 
of  the  diffeomorphism  {,.,(•)  satisfies 
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J,.,(x)^cxp 


with 


-  ]  ho{i..M)  dx-l  dt({.,  ^x))  </t|  (4.4) 


a,  A  f;  ^=divp,. 

i-i  dXi 


l^k^d 


-aI 


dpi  dpi 


2*-!  ij=i  5xj  5Xi  t  = 


Proof  Transform  first  the  stochastic  differential  equation  (4.2)  into  Stratono- 
vich  form 


=  P(Q  °  idY.-h(Q  dt]  -ipo(^.)  dt. 

Similarly  to  the  Liouville  formula  for  ordinary  differential  equations,  see  Hartman 
[10],  it  holds 

if  logy.,,(x) =a*(^,.,(x))  0  ldY,-h(^M  dO  -  ho(^,.,(x))  dt div  Po(f,.,(x))  dt. 
Transforming  back  to  Ito  form 

d  log  J,.,(x)  =  a*(^,.,(x))[<f  - /i(^,.,(x))  dt]  - 
-  i  div  po(iJ,.,(x))  dt  +  iao(^,,iW) 

Now  it  holds 


divp(,= 


d  m  ^ 

1  1  f 

t=i  dx, 


«.*  i  i  i  i 

*-l  (-1  v-*l  k^l  i.J~l 


which  finishes  the  proof  □ 

Remark  4.3  Note  that  [d,.X'J«.j(  ))]”‘  is  actually  the  Jacobian  of  the  inverse 
diffeomorphism  f/, ,,(•). 


Define 
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^exp  |}  h*a.Jx))  dY,-^\  |h({..,(x))|*  Jt  j. 


(4.5) 


and 


®,..W ^2...(x)[y,.,(x)]- ‘  =exp |} h^{i,  Jx))  dY, -  J  J |/i({..,(x))P  dr 

(.*  2, 

-  ja*(^,..(x))[dy,-l,(^,.,(x))dT]  +  jM^,.,(x))  dT  +  idE(f,.,(x))drj. 
Introduce  the  following  definition 


(4.6) 


or  equivalently 


fi?4(^s.r(x))  =  q{x)©,  ,(x). 

where  the  same  notation  has  been  used  as  in  the  previous  section.  This  will  be 
justified  by  the  Theorem  4.8  to  be  proved  below. 

Remark  4.4  Under  the  additional  assumption  that  the  coefficient  p  has 
bounded  derivatives  up  to  order  2,  the  Lemma  6.2  of  [16,  Chapter  2]  gives  the 
following  explicit  expressions  in  terms  of  backward  Ito  stochastic  integrals 

S,,,(»?,.^x))  =  exp  |j  h*{ri,Jx))®dY,  -  1 }  |h(;?,.,(x))|^  dr  V»;,,,(x))  dr  J, 

JUriUx))  =  exp  |}  a*(f?,.,(x))©[dy,  -/»(»,,  Jx))  dr] 


-  j  Kifli.Ax))  dr  -  J  a{f?,.,(x))  dr  - 1  ao(';,.r(x)) 
where  the  coefTicients  ho  and  Xq  have  already  been  defined  as 

_(  _  A  X'  V'  doLi,  j  ^  ^  S^pi  i 


*0^  I  I  Ppi  and  ao^  I  I  V  V 

*-i  i-i  dxj  j.j  dx,  J—,  i.^idXjdXj 


Therefore 
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r,,^x)^©,.^»f,.j(x))=exp 


i  h*(*lUx))®dY,  -  ^  J  |M»?,.,(JC))P  dx 

M  ^  M 


i  x*{ti,,jl.x))®[dY,-h{ti,Xx))  dx^  +  iHfi,Jx))  dx  +  ^ao{ri,Jx))  <iT|.  (4.7) 


Remark  4.5  If  p  =  0,  then  ,(x)=x  so  that 

mx)  =  q{x)  exp  mx){  Y,  -  K.)  -i|h(x)|^(t  -  s)}, 
which  is  actually  the  explicit  solution  of  the  equation 

dq.=  i  h,q,dY}, 

k=l 

with  initial  condition  q  at  time  s.  In  this  case,  (2.4)  reduces  to  the  discretization 
schemes  considered  in  [2, 18,20]. 

First,  the  following  stability  result  holds 

Proposition  4.6  Let  n^O  be  fixed.  Assume  that 

•  c,  p  and  h  have  bounded  derivatives  up  to  order  (n+ 1), 

•  the  initial  condition  satisfies  qeH”. 

Then  ^,q  is  a  square  integrable  random  variable  with  values  in  H".  In  addition,  the 
following  estimate  holds 


Proof  It  is  enough  to  prove  the  result  for  n  =  0. 

Using  the  change  of  variable  x  =  ri,  fy)  i.e.  >  =  ij,,,(x) 

Et|09P  =  Et  J[k(t?,.,(>'))|©,.,(»7,.,(>'))]^dy=  J  |9(x)|^E{0,,,(x)}  dx, 
and  the  result  follows  from  the  estimate 

sup  E{©,.,(x)} □ 

xcR"* 

Another  property  of  the  two-parameter  stochastic  semigroup  {CJ.O^sgr}  is 
provided  by  the  following 

Proposition  4.7  Let  {T^.t^O}  be  the  semigroup  generated  by 

I  ■" 

A=-  y  c‘-^— — 

ex,8xj 
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Then 

Ete;=7'.*_.. 

Proof  Using  the  same  change  of  variable  as  in  the  proof  of  Proposition  1.5,  it 
holds 


(Et(QJ9),/)  =  Et  f  q(riay))®UnUy))f(y)  dy  =  J  q{x)EU(iUxm  dx, 
for  any  test-function  f.  Now,  under  the  original  probability  measure  P 

di,=p{i,)dV„ 

where  {K„r^0}  is  a  Wiener  process  with  covariance  matrix  /.  Therefore 

mQ’.q)J)  =  (q^  T,-,f)=iT*.^J).  □ 

The  following  representation  result  of  the  solution  of  Eq.  (3.2)  in  terms  of  the 
stochastic  characteristics  is  the  stochastic  counterpart  of  the  usual  method  of 
characteristics  for  linear  first-order  PDE.  It  has  been  proved  by  Krylov-Rozovskii 
[13]  and  Kunita  [15, 17], 

Theorem  4.8  Let  {GJ,s^t}  be  defined  by  {4.6).  Then,  the  unique  solution  of 
equation  (3.2)  satiates 


q.(x)  =  Q‘.q,M.  (4.8) 

Proof  The  proof  given  below  is  essentially  that  of  [13].  Introduce 

where  {d>„s^T^t}  is  deterministic. 

It  follows  from  the  Ito  formula  that  g,^Et(Cr9t)  satisfies 

q>^*qr+  I  Btq.d^'i,  (4.9) 

with  the  initial  condition  ^j  =  Et(q,).  On  the  other  hand,  define 

w/x)  4  E*|^/({.ix))  exp  |i  <l>m,,Ax))  dr'  JJ, 


where  undei  the  probability  P* 
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d^,  =  pii,)ldVt  +  <l>,dTl 

and  is  a  Wiener  process  with  covariance  matrix  1.  By  the  Feynman- 

Kac  formula,  satisfies  a  PDE  which  is  dual  to  (4.9),  so  that 

(q„/)=(«„  w,).  Consider  now  the  right-hand  side  in  the  representation  result  (4.8). 
T^en 


,(x))dT|[J,,(>;,,(x))]~‘, 

with 

Sf,(x)  ^exp  |}  [h(^,.,(x))  -F  <t>X  dY,--J  |h(^..,(x))  -t- 

Define  next  — Et(f?  CJ<J,).  The  Fubini  theorem,  the  change  of  variable  x  =  r),  ,{y) 
and  the  Lemma  6.2  of  [16,  Chapter  2]  give 


f?  •  C?9,(x)  =  q^fl,.Jix))Et,{ri,Jx))  exp  \  j 


(v, 


f„/)  =  EtJ/(y)4,(f/,,,(y))Ef,,(»j,.^y))exp|j <^?h(;7,,,(y))dT|[J,  ,(y))]'  ‘  dy 

=  Et  J  f(i,Jx))q,{x)EtJx)  exp  Ij  <^,*h({,.Xx))  drj  dx 


=  J  9,(x)Et  j^/(<?,.,(x))Hf,,(x)  exp  |j  <i>*h{i,_,(x))  dx 
=  J  exp  |j  <t>fh(^,Jx))  dij 


dx 


dx  =  (q„w,). 


It  follows  that  (q„f)=(v„f)  for  arbitrary  test-function  /  and  arbitrary  {0„s^Tg 
f},  which  finishes  the  proof.  □ 


5.  APPROXIMATION  OF  THE  STOCHASTIC  CHARACTERISTICS 

It  has  been  proved  in  Section  4  that  the  stochastic  semigroup  {Qf,  O^s^t} 
associated  with  the  degenerate  second-order  stochastic  PDE  (3.2)  satisfies 

c;</>(x)=<#>(Ox))rjx),  (5.1) 

where  ;;,_,(•)  is  the  inverse  of  the  stochastic  flow  of  diffeomorphisms  {,,,(■) 
associated  with  the  stochastic  differential  equation  (4.2),  and  r,,,(x)  has  been 
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defined  in  (4.7).  The  purpose  of  this  section  is  to  investigate  approximations  of 
(5.1). 

Considering  that  )  is  also  the  stochastic  flow  of  diffeomorphisms  associated 
with  the  backward  stochastic  differential  equation  (4.3),  it  is  natural  to  consider 
the  following  approximation 


Q‘Mx)^4>ifiax))tax),  (5.2) 

where 

n,Jx)^x-p(x)lY,-Y,- h(x){t  -  s)]  +  po(>c)(t  -  s), 
and 

r../x)  Aexp{/,*(x)(y;-  y.)-i(fi(x)P(t-s)-a*(x)[l^-  y.-/,(x)(t-s)] 

+  a(x)(r-j)  +  ao(Jc)(t-5)}. 

are  computable  approximations  of  i;,.j(x)  and  r,,,(x)  respectively,  both  depending 
only  on  the  inaements  {Y,—  Y,). 

Remark  5.1  One  possible  approach  would  be  to  approximate  >;,  ,(•)  by  the 
stochastic  flow  of  diffeomorphisms  associated  with  the  ordinary  differential 
equation  obtained  from  (4.3)  by  replacing  the  observation  sample-path  {y„0gtg 
7}  with  some  regular  approximation,  such  as  the  Euler  stepwise  approximation  or 
the  polygonal  interpolation.  The  numerical  analysis  of  such  an  approximation 
should  not  be  very  difficult.  However,  the  resulting  approximation  would  not  be 
explicitly  computable. 

The  remainder  of  this  section  is  devoted  to  studying  the  rate  of  convergence  of 
this  approximation.  First,  a  stability  result  similar  to  Proposition  4.6  is  needed. 

Condition  A  Let  n^O  be  fixed.  Assume  that  the  initial  condition  satisfies  qeH". 

Then  Q’q  is  a  square  integrable  random  variable  with  values  in  H".  In  addition, 
the  following  estimate  holds 


Remark  5.2  Because  is  not  a  diffeomorphism,  this  stability  result  can  not 
be  proved  in  the  same  way  as  in  the  proof  of  Proposition  4.6.  The  following 
proposition,  which  is  proved  in  the  Appendix,  shows  that  Condition  (A)  bolds  in 
the  simple  case  where  the  correlation  coefficient  p  is  constant.  Whether  this 
remains  true  in  the  general  case — or  how  to  modify  the  approximation  scheme  in 
such  a  way  that  Condition  (A)  holds  without  any  additional  assumption  on  the 
correlation  coefficient — is  still  an  open  problem  (however,  see  Remark  A.l  below). 

Proposition  5.3  Let  n^O  be  fixed.  Assume  that 

•  p  is  constant, 

•  h  has  bounded  derivatives  up  to  order  n. 
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Then  Condition  (A)  holds. 

Remark  5.4  The  approximations  and  r,_,(x)  are  based  on  the  explicit 

expressions  for  and  r,./x),  given  in  (4.3)  and  (4.7)  respectively.  This  explains 
why  the  regularity  assumptions  on  the  coefficient  h  are  different  in  Proposition  4.6 
and  Proposition  5.3. 

Remark  5.5  In  the  case  where  the  correlation  coefficient  p  is  constant,  the 
approximations  of  >i,.,(x)  and  r,./x)  take  the  simple  form 


•ji. .( Jc)  -  X  -  p[  K,  - 1;  -  /»(x)(t  -  s)], 
and 

r,.^x)  4cxp  mx){Y- y;)-i|/i(x)|^(t-s)}, 

respectively. 

Next,  the  following  proposition  provides  an  error  estimate  for  commuting  the 
operator  and  spatial  derivatives. 

Proposition  5.6  Let  n^O  and  a  a  multi-index,  be  fixed.  Assume  that 

•  p  has  bounded  derivatives  up  to  order  (n  +  |a|  +  2), 

•  h  has  bounded  derivatives  up  to  order  (n  +  |a|), 

•  the  initial  condition  satiates 
Then,  under  Condition  (A) 

{Et||e;D«9  -  g  cyr^ii^iu 

Here  again,  the  proof  of  this  proposition  is  given  in  the  Appendix. 

□  Overall  Error  Estimate 

The  main  result  of  the  paper  is  provided  by  the  following 
Theorem  5.7  Consider  the  following  approximation  scheme 

Pi*i  =  PlQ\U,Pi- 


Assume  that 

•  a  and  c  have  bounded  derivatives  up  to  order  4, 

•  b  and  p  have  bounded  derivatives  up  to  order  3, 

•  h  has  bounded  derivatives  up  to  order  2, 

•  the  initial  condition  satires  po  e  H^. 

Then,  under  Condition  (A),  p,  approximates  the  solution  p„  of  the  origirml  equation 
{3.1)  with  a  rate  of  convergence  of  order  yfl.  Indeed 
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Proof  In  view  of  Theorem  3.4,  it  is  enough  to  prove  that 
{Et|A-PiP}*'*^Cv^||po||2- 

Similarly  to  the  proof  of  Theorem  3.4,  the  idea  is  to  get  an  equation  for  Q’(f>  with 
(f>  smooth  enough,  that  is  similar  to  the  original  Eq.  (3.2)  for  except  for  the 
initial  condition  and  for  some  perturbation  terms  which  have  to  be  estimated.  This 
gives  an  estimate  of  the  one-step  error,  and  the  global  estimate  is  obtained  using 
the  Gronwall  lemma.  Throughout  the  proof,  the  summation  convention  over 
repeated  indices  i,  j,  is  used. 

Differentiating  both  sides  of  (5.2)  with  respect  to  t 

dQ‘4>(x) = ^  (fiux))  Pi(x)[dyf  -  fc*(x)  df] + pi>(x)  dt  j  r,.^x) 

+^^|^^(<i,.s(x))^I  [pi(x)pi(x)di]r,.(x) 

+  4>(n,.,(x))\  X  Kix) dYi-\\h(x)\^dt-  X  a»(x)[dyf-hj(x)dt] 

L«;  =  i  ^  *=>1 

+  a(x)  dt  +  ao(^)  ^  !^(^)  -  j  r,..(^) 

+|^(^..5W)|^-  Z  pi(x)[M*) -«»(*)] ‘^tjr,..(x) 

= ^  c'-'(x) (^,.,(x))r,  jx)  dt 
2  dx.  exj 

+  j^PoW+  Z  Pkix)itk{x)^^{riU^))T‘,Jx)dt 

+  [a(x)  +  aoix)  +  i|a(x)P]<^(^,  Jx))r,  Jx)  dt 
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-  i  pi(x)^ifiax))rux)dYi 

**1 


Now,  it  can  be  checked  that 

1,  „  1 


a  +  ao+zla|*=x 


Idxidxj 


and  Po+  2.  P***=-r-- 
*T,  dx. 


Therefore,  it  holds 


d^<l) 


'■^{x)^(riax)) 

OXi  CXj 


dQ‘Mx)=‘^\c‘-^ 

I;'*'  f,  >’"■<*»  -"I  '■■■•<*>  ■" 

+  1^  ^h,(x)<l)(ij.Jx))-pi(x)^{rj,Jx))  -  ^{xmax))^ T,Jx)dYi 
+  Z  ^h^{x)Q’4>(x)- pi{x)Q’^{x)  -  ^(x)Q’<i>{x)^dY^ 


so  that 


dQ’(t>  =  A*Q^.<t>dt+  X  B:Q’<t>dYi 

The  difference  satisfies 

dE,  =  A*e,dt+  X  Bfe,dYt+/,dt+  f 
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where  the  perturbation  terms  are  defined  by 


respectively.  Using  estimates  of  [13] 

Et|£.P  g |^Et|£,|^  +  CEti  l/.l^ dr  +  CEt^I  }  l^f  11^ 

Moreover,  it  follows  from  Proposition  5.6  that 

EtU|^SC(T-s)Et|l<^i|ie^'''^ 
Et|kJ|PgC(T-s)Et|l<^l|^e^''-*>, 
and  therefore  the  Gronwall  lemma  yields 

Et|e,l^  ^  [Et|e,|^  +  C(r-s)^Et||<^!|i] 

provided  e  Now,  it  follows  from  the  assumptions  and  in  particular  Condition 
(A),  that  for  all  i,  so  that  setting  s-t,,  r  =  tj+,,  4>=Pi  and 

Etie:;*  .P.-e;:*  .PiP  ^  [Et|p.-P,p+ c(i<* ,  -g^Etiip- nn 


Et|p’ .  1  -p-i. ,  1^ = Hp»m.,Pi-Q'L.pit 

^CEtl^i-P.i"  +  -t,)*Et||ftl|i] 

and  the  result  follows  from  the  discrete  Gronwall  lemma.  □ 

A  further  step  in  the  time-discretization  would  consist  in  approximating  the 
Fokker-Planck  semigroup  {P,*,t^0},  using  some  classical  approximation  scheme. 
For  instance,  using  the  backward  Euler  scheme  would  result  in  the  following 
global  approximation  scheme 


with  the  same  error  estimate. 
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□  Particle  Approximation 

Another  possible  approach  to  approximate  the  degenerate  second-order  stochastic 
PDE  (3.2) — based  also  on  the  representation  (S.l)  in  terms  of  stochastic  character¬ 
istics — ^would  be  to  use  particle  methods,  adapting  the  results  presented  in  Raviart 
[24]  for  deterministic  first-order  PDE.  The  basic  idea  is  to  solve  exactly  Eq.  (3.2) 
for  an  approximation  of  the  initial  condition,  rather  than  approximate  the 
stochastic  characteristics  as  was  done  before. 

Suppose  that,  at  time  tj  an  approximation  of  the  conditional  probability 
distribution  q(x)  dx  is  available,  in  terms  of  a  convex  linear  combination  of  Dirac 
masses  sitting  at  some  particle  locations  with  corresponding  weights 

{af,fc6X}  i.e. 


9(x)dx~  X  af^(Jc-*?)- 

ktX 


(5.3) 


Solving  exactly  Eq.  (3.2)  in  weak  sense,  with  the  approximation  (5.3)  as  initial 
condition,  gives  the  following  approximation 

ktK 


for  the  solution  at  time  tj+,.  The  new  particle  locations  {x{'+,,ilce A}  and  the 
corresponding  weights  {ai+,,ksK}  are  computed  according  to 

=  and  a?4.i  =  fl?H,„,,.,(x?), 

where  ?,.,(  )  is  the  diffeomorphism  associated  with  Eq.  (4.2),  and  E,,,(  )  has  been 
defined  in  (4.5). 

The  error  estimate  associated  with  this  particle  approximation  will  be  studied 
elsewhere. 


6.  CONCLUSION 

A  time-discretization  scheme  of  the  Zakai  equation  for  diffusion  processes 
observed  in  correlated  noise  has  been  proposed,  based  on  the  stochastic  character¬ 
istics  introduced  in  [13,15,17].  Under  the  additional  assumption  that  the  correla¬ 
tion  coefficient  is  constant,  it  has  been  shown  that  the  rate  of  convergence  of  this 
approximation  is  of  order  y/6,  where  5  is  the  time  discretization  step. 

The  same  rate  of  convergence  has  been  obtained  in  Elliott-Glowinski  [7]  for  a 
different  approximation 

•  on  one  hand,  the  approximation  considered  in  [7]  has  a  probabilistic 
interpretation,  which  is  not  the  case  so  far  for  the  time  discretization  scheme 
presented  here  (however,  see  Remark  3.6  above). 
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•  on  the  other  hand,  the  latter  is  actually  computable,  whereas  no  numerical 
algorithm  is  provide  to  compute  the  approximation  considered  in  [7], 

Another  point  of  interest  would  be  to  study  some  particle  approximation  for  the 
degenerate  second-order  stochastic  PDE,  adapting  the  results  presented  in  Raviart 
[24]  for  deterministic  first-order  PDE. 

As  was  pointed  out  to  the  authors  by  Harold  Kushner  and  the  anonymous 
referee,  one  would  have  to  discretize  the  space  variable  and  to  bound  the  state 
space,  in  order  to  get  a  completely  computable  numerical  scheme.  This  is  a 
different  problem,  for  which  several  approaches  have  already  been  used:  hnite 
difference  approximation,  by  Kushner  [18]  and  DiMasi-Runggaldier  [S],  finite 
element  method,  by  Bennaton  [1]  and  Germani-Piccioni  [9],  with  error  estimate. 
The  reference  [9]  also  provides  error  estimate  for  bounding  the  state  space,  using 
weighted  Sobolev  spaces  introduced  by  Krylov-Rozovskii  [14].  Therefore,  the  time 
discretization  scheme  presented  in  the  paper  should  be  combined  with  such  space 
discretization  techniques,  in  order  to  be  completely  computable.  To  some  extent, 
the  choice  of  the  space  discretization  scheme  is  dependent  on  the  application:  for 
instance,  the  method  of  characteristics  (also  called  particle  approximation  in  [24]) 
is  well-adapted  to  first-order  PDE  arising  in  the  filtering  of  noise-free  processes, 
and  has  been  recently  used  in  target  tracking  applications,  see  Campillo-Le  Gland 
[4]  and  Lasdas-Davis  [19]. 
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APPENDIX 

Proof  of  Stability  and  Commutation  Estimates 

The  purpose  of  this  appendix  is  to  prove  the  stability  and  commutation  estimates 
for  the  approximation  introduced  in  Section  5. 

Proof  of  Proposition  5.3  It  is  enough  to  prove  the  result  for  n  =  0. 

Since  is  not  a  diffeomorphism,  one  can  not  use  a  change  of  variable  as  in  the 
proof  of  Proposition  4.6.  Instead,  one  uses  the  fact  that  and  T,  ,(x)  arc  very 
simple  functions  of  the  Gaussian  random  variable  (T,- 1^).  First 

EtlCf^P = Et  J  ck(ff,./x))ir../x)]*  dx 

*=  [2n(t-s)]*'^  ^  ^  ~  -  Hx){t  -  s)]  -l-  Po(7c){t  -  s))P 


X  exp  {2h*{x)w-  |h(x)|*(t  -  s)  -  2a*(x)  [  w  -  h(x)(  t  -  s)] 
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+  2a(jc)(t  -  s)  +  2ao(Jc)(t  -  *)}  exp  |  dw  dx. 

Next 

2[/«(x)-«W]V-  J^=2|Mx)-a(x)P{t-5)- 

2(t-s)  2(f-s) 

so  that,  using  the  new  variables  (x,e)  with  »=w— 2[h(x)-a(x)](t— s) 


1 


[2«(f-s)r- 


iii\qix-p(x)v 


+  y(x)(t-s))Pcxp|  -  2(1^1'^*’'^*’ 


where  y(x)  ^ po(x)-p(x)[h(x)-2a{xy]. 

In  the  particular  case  where  p(x)=p,  the  application  f(x)^x— pu  +  y(x)(t— s)  is 
a  diffeomorphism  provided  Og(t— s)<l/C,  and  moreover  the  Jacobian  is  bounded 
below  by  [l-C(t-s)].  Therefore,  using  the  new  variables  (y,z)  with  >’=f(x) 

provided  0^(t— 1/C,  which  finishes  the  proof.  □ 

Remark  A.]  According  to  the  detail  of  the  proof  above,  it  is  enough  for  the 
Condition  (A)  to  hold,  that 

[27r(t-s)]^'^  J 1 + yWCf - 5))l^  ”p  I  -  2^?:^}  ^ 

for  any  bounded  function  y. 

Proof  of  Proposition  5.6  Here  again,  it  is  enough  to  prove  the  result  for  n=0 
and  Throughout  the  proof,  the  summation  convention  over  repeated  indices 
J  is  used. 


For  q  smooth  enough,  it  holds 
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CXi  CXj  L  **1 

+qifiax))^i  ^g(x) - |^(x)^ [yf- y.‘-(f-s)Mx)] 
+  (t-5)(^i^a*(x)g(x)+£(x)+^(x)^jrUx) 
=ef|^(x)-  i  ^*(x)[yf-yi-a-s)Mx)]ef|^{x) 

dXi  dXi  dxj 


+  ^I  y5-(‘-^)Mx)]e?g(x) 


Therefore,  under  Condition  (A) 


'gC(r-5)[EtfJefg-[  +  Et|5;-?r] 


SC(t-5)||<j|p.  □ 


PARTICLE  APPROXIMATION 
FOR  FIRST  ORDER  STOCHASTIC 
PARTIAL  DIFFERENTIAL  EQUATIONS* 


Patrick  FLORCHINGERt 
Universite  de  Metz 
Departement  de  Mathematiques 
URA  CNRS  399 
He  du  Sauky 
F-57045  METZ  Cedex 


Frangois  LE  GLAND 
INRIA  Sophia-Antipolis 
Route  des  Lucioles 
F-06565  VALBONNE  Cedex 


Abstract 

A  class  of  degenerate  second  order  stochastic  PDE  is  considered,  for  which  a  rep¬ 
resentation  result  in  terms  of  stochastic  characteristics  has  been  proved  by  Krylov- 
Rozovskii  [2j  and  Kunita  [3,4].  An  example  of  a  stochastic  PDE  in  this  class  has 
been  exhibited  in  Florchinger-LeGland  [1]  as  the  result  of  a  Trotter-like  product 
formula  for  the  Zakai  equation  of  diffusion  processes  observed  in  correlated  noise. 
Particle  approximations  are  introduced  for  this  class  of  stochastic  PDE,  and  er¬ 
ror  estimates  are  provided  which  extend  the  results  of  Raviart  [6]  on  first  order 
deterministic  PDE. 


1  Introduction 

Consider  the  following  stochastic  differential  equation 

dX,  =  b(X,)dt  -f  <t(X,)  [dWt  -  e(Xt)dt]  ,  (1.1) 

where  t  >  0}  is  a  d-dimensional  standard  Wiener  process,  and  the  associated  sto¬ 
chastic  flow  of  diffeomorphisms  0  <  s  <  <},  and  define 

Eo,((x)  =  ^'{io..{x))dW, 

/J|e((o,.(x)pds  +  /Jc(eo..(x))ds}  . 

'Research  partially  supported  by  USACCE  under  Contract  DAJA45-90-C-0008. 

talso  :  INRIA  Lorraine,  CESCOM,  Technopole  de  Metz  2000,  4  rue  Marconi,  F-57070  METZ. 
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Introduce  the  following  partial  differential  operators 


A  ^  .  d 

Bk  =  eic  +  ^  er'k— —  ,  ^  <  k  <  d  , 

i=l 


with  a  =  (7  (T*,  2uid  the  stochastic  PDE 


dq,  =  L'q,dt  +  '£B:q,dW{^ 
*=1 


(1.2) 


Because  of  the  relation  a  =  a  a‘  between  coefficients  of  higher  order  partial  derivatives 
in  operators  L  and  Bk,  equation  (1.2)  is  a  degenerate  second  order  stochastic  PDE  or 
equivalently,  after  transformation  into  Stratonovich  form,  a  first  order  stochastic  PDE. 
Existence  and  representation  results  have  been  obtained  by  Kunita  [4]  for  (generally 
nonlinear)  first  order  stochastic  PDE,  based  on  the  notion  of  stochastic  characteristics. 

In  a  previous  work  [1],  the  ZaJcai  equation  for  the  nonlinear  filtering  of  diffusion  pro¬ 
cesses  observed  in  correlated  noise  has  been  considered.  A  decomposition  of  the  Zakai 
equation  has  been  introduced,  exhibiting  a  degenerate  second  order  stochastic  PDE  sim¬ 
ilar  to  (1.2)  in  the  correction  step.  In  addition,  a  time  discretization  scheme  has  been 
proposed  for  this  degenerate  second  order  stochastic  PDE,  with  rate  of  convergence  of 
order  \/6,  where  S  is  the  time  step. 

The  purpose  of  this  paper  is  to  provide  a  discretization  scheme  of  the  degenerate  second 
order  stochastic  PDE  (1.2)  with  respect  to  the  space  variable  x  €  R”.  This  approximation 
relies  on  the  representation  of  the  solution  in  terms  of  stochastic  characteristics,  and 
approximation  of  the  initial  condition  by  a  convex  linear  combination  of  Dirac  masses. 
This  kind  of  aproximation  is  called  a  particle  approximation,  see  Raviart  [6]. 

More  specifically,  for  any  probability  measure  p{dx)  on  R™,  define  the  transformed 
measure  Qt  p{dx)  by 

{QiM,<i>)  =  j  <^(^o,i(x))Ho,,(x)/i(dx)  ,  (1.3) 

for  any  test  function  or  equivalently 

=  /  .  ^.((x)/j(dx)  . 

•'io.lt*) 

Note  that,  if  ^  is  regular  enough,  then  the  Ito  formula  gives 

d 

Ho,i(x)  ]  =  L<t>(^aAx))  ■  Ho,i(x)  dt  + '^  R^|^(^o,t(x))  ■  Ho,((x)  dlE*  . 

/k=l 
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Therefore  ^i{dx)  =  Qtti{dx)  solves  equation  (1.2)  in  weak  form,  i.e. 


d 

dfit  =  L’ftt  dt  +  '^  Bliit  dWf  ,  iio  =  . 

k=\ 


(1.4) 


Consider  next  the  following  two  different  tissumptions  on  the  original  m-asure  fio{di)  '■ 

□  Assume  that  the  original  measure  fi{dx)  has  a  density  q(x)  with  respect  to  the 
Lebesgue  measure  on  R",  i.e.  (i(dx)  =  q(x)dT.  Then,  the  transformed  measure  Qt  fi{dx) 
has  itself  a  density  qt(x)  which  satisfies 

9((^o,i(a-))  •  JoAx)  =  Ho,,(i)  •  q{x)  , 

or  in  integrated  form 

/  qt{x)dx=  /  EoAx)  ■  qM  dx  . 

Here,  JoA')  '■h®  Jacobian  (i.e.  the  determinant  of  the  Jacobian  matrix)  of  the  sto¬ 
chastic  flow  4o,((  )-  In  addition,  the  density  q,{x)  solves  the  degenerate  second  order 
stochastic  PDE 

d 

dqt  =  L*qt  di  +  Yi  d^V!"  ,  qo  =  q  ■  (1-5) 

k=i 

□  Assume  that  the  original  measure  //(dx)  is  a  convex  linear  combination  of  Dirac 
masses,  also  called  particles 

Adx)  =  X^<i‘^(j  -  x')  , 

where  {o',  i  €  /}  are  the  particle  weights,  and  {x’ ,  i  6  7}  are  the  particle  locations. 
Then,  the  transformed  measure  Qt  p(dx)  has  a  similar  representation 

Q,fi{dx)  =  ^a;5(x-  xj)  , 

•e/ 

where  the  particles  have  been  transported  by  the  flow  i.e.  xj  =  ^o,t(x'),  and  the  weights 
have  been  updated  according  to  a|  =  a'  Eo.((x'), 

The  idea  behind  particle  approximation  for  equation  (1.2)  is  the  following  : 

•  given  an  initial  condition  ^o(dx)  with  density  qo{x),  find  an  approximation  pA^x] 
in  terms  of  a  linear  convex  combination  of  Dirac  masses, 

•  use  the  exact  solution  of  equation  (1.4)  with  the  approximation  /i5(dx)  as  initial 
condition,  as  an  approximation  for  the  solution  of  the  original  equation  (1.5),  and 
get  error  estimate  if  possible. 
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This  can  be  illustrated  by  the  following  diagram 


qo{x)dx  =  iio{dx) 


li^(dx) 


Qt 


Qi 


qt(x)di  =  Ht(dx) 


ti'iidi) 


The  remaining  of  this  section  is  devoted  to  recalling  standard  results  concerning  sto¬ 
chastic  flows  of  diffeomorphisms  and  stochastic  PDE. 


Proposition  1.1  Let  n  >0  be  fixed.  Assume  that 

■  b,  a  and  e  have  bounded  derivatives  up  to  order  {n  -f  1). 
•  c  has  bounded  derivatives  up  to  order  n. 


Then  is  a  C” -diffeomorphism  in  R".  In  addition,  the  following  estimates  hold  for 

all  p  >  \ 


sup  E 

rCR™ 


<  00  , 


I  <  |o|  <  7?  , 


sup 

xeR"* 


<  oo 


0  <  IqI  <  n 


Restricting  to  compact  sets  of  R”*,  it  is  possible  to  invert  the  supremum  and  the 
mathematical  expectation  in  the  estimates  above,  see  the  Corollary  4.6.7  of  Kunita  [5] 


Proposition  1.2  Under  the  assumptions  of  the  Proposition  1. 1,  there  exists  a  constant 
C  >  0,  such  that  for  any  compact  set  B  C  R”*  and  e  >  0  the  following  uniform  estimates 
hold  for  all  p>  I 


E 


sup|Z)“6.<(^)l’’ 

x6fl 


<  C  (1  , 


]  <  |o|  <  n  , 


E 


sup|Z7“E,.,(i)r 

t€B 


<  C  [1  +  S”-^]  , 


where  6  =  6(B)  denotes  the  diameter  of  B. 


0  <  |q1  <  7)  , 
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For  all  n  >  0,  p  >  1,  let  IV"'*’  =  IF"'*’  (R™)  denote  the  space  of  real-valued  Lebesgue- 
measurable  functions  on  R*"  whose  generalized  derivatives  up  to  order  n  are  integrable  in 
p-mean,  and  define  the  corresponding  norm  ||  •  |(„,p  and  semi-norm  |  ■  |„,p  by 

ll“ll^,P=  E  l\D‘'u(x)\^dx  and  |<,  =  E  /  - 

|ol=n 

respectively. 

Consider  the  following  degenerate  second  order  stochastic  PDE 

d 

dq,  =  L-9,  dt  +  E  BUt  dVF/  ,  90  =  9  ■  (1  -6) 

*=i 

Although  no  coercivity  hypothesis  is  satisfied,  the  following  existence,  uniqueness  and 
regularity  result  is  proved  in  Krylov-Rozovskii  [2]. 

Theorem  1.3  Let  n>\  be  fixed.  Assume  that 

■  a  has  bounded  derivatives  up  to  order  ma.x(n,2), 

■  b,  a,  c  and  e  have  bounded  derivatives  up  to  order  n, 

■  the  initial  condition  satisfies  qo  6  VF"’*’. 

Then  equation  (1-6)  has  a  unique  solution  q  g  A/’’(0,T ;  M''"'’’).  In  addition 

geinn;  C^.(10,T];  IF"-'’)), 
and  the  following  estimate  holds 

E[  sup  Wq.tJ  <  ||9o!r„,p  . 


2  Quadrature-based  particle  approximation 

With  the  quadrature  formula  (A.l) 

/ g(x)dx  ~  E*^'  9{^')  > 

•'  .6/ 

is  associated  the  following  particle  approximation  for  the  initial  density  9o(j) 

qoix)dx  -  ^  6(x  -  x*)  .  (2.1) 

»€/ 

This  induces  the  following  particle  approximation  for  the  solution  qi{x)  of  equation  (1.6) 

qt{x}dx  =  fit(dx)  ~  f^tidx)  =  Eo,((x’) 9o(^*)  -  (oA^l)  • 

»€/ 

The  following  error  estimate  holds  in  Sobolev  space  with  negative  exponent,  which 
extends  the  result  of  Raviart  to  the  case  of  first  order  'tochastic  PDE. 
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Theorem  2.1  Let  n>m  be  fixed.  Assume  that 

■  h,  a,  c  and  e  have  bounded  derivatives  up  to  order  (n  +  1), 
•  the  initial  condition  satisfies  go  €  W"-'’. 


Then  there  exists  a  constant  C  >  0  independent  of  h,  such  that 


E\\p,  -  <  C  A"  ||9o||„,p 


Proof.  Let  4>  €  IV'*'’’  be  an  arbitrary  test  function.  Since 

=  I  <i>iiQ.t(x))EQ,t(x)go(x)dx  ,  w’ (/>(^o.((^’))  Ho,,(i’)  9o(x’)  , 

■'  tel 

it  follows  from  Theorem  A. 2  that 


\{tit,<t>)-{ti^t4‘}\<C  h”  iflL, , 

with  g  —  4>  <t  ia.t  ■  ^o,<  go  ,  provided  g  6  VP"’’,  n  >  m. 

Under  the  assumptions  on  the  coefficients,  o  ^o.i  €  IP"-’’'  and  Eq.i  ■  9o  6  H''"'’’,  for 
conjugate  p  and  p'.  Moreover,  the  generalized  Leibniz  formula  yields 

l9ki<  E  flXa.Mx)D^4>((o..(x))D^go(x)ldx  . 

(a, <?)€/„■' 


where  /„  denotes  the  set  of  pairs  (Q,fi)  of  multi-indices  such  that  |q|  +  |/?1  <  n,  and 
are  random  fields  involving  the  derivatives  of  (oA')  ^f>d  Ho.((-)  up  to  order  n.  Using  back 
and  forth  the  changes  of  variable  induced  by  the  differomorphisms  ^o.t(  )  Cat  (■)• 
the  Holder  inequality,  gives 

(o,0)€/„ 

<  E  If  \D°<i>(x)r'dxy^"  {  I  |xo,MCo,l(x))  D^goUo.Hm^ 

{a,P)el^  ^  ■’ 

<  ll?^lln,p<  E  I  I  IXa./j(x)  D%{xW  . 

(o, <?)€/„ 

Therefore 


lkl|n,p' 


<C  h” 


E 

(o.«€/n 


{/ 
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and 


Ell;/,  -  /if  ll-n.,  <  c  fe"  Yi  (  /e  {ix»,^(*)r  [Jo,,(x)]-''’-»} 

(o,0)€/„ 

lD^9o(x)rdx}'^’’  . 


From  estimates  in  Proposition  1.1,  it  holds 


so  that 


sup  E{1xo./3(x)|’’  K,(x)]-'’’-»}  <  oo  , 


E|l/i,-pf||_„,p<Cr  Ikoll 


Regularizat  ion 

Let  C(j)  be  a  continuous  cut-off  function  defined  on  R"*,  which  satisfies 

(t)  J^(x}dx  =  l, 

(ii)  ^1“  C(x)dx  =  0  ,  l<|Q|<it-l, 

{Hi)  y  |xl*  K(x)ldi  <  oo  . 

for  some  k  >2.  For  any  e  >  0,  Ce(x)  is  defined  by  the  following  scaling 


With  the  particle  approximation 


t^Hdx)  =  ^u,'ok(.,(x’)9o(x')  ^(x  -  i;)  , 

»€/ 

is  associated  the  regularized  measure 

til'’(dx)  =  Ce(dx)  =  g^'‘(x)dx  , 

where  the  density  qt'^(x)  is  given  by 

=  ^u;’Ho,,(x’)9o(x’)  C.(x  -  x’,)  . 
te/ 

The  main  result  of  this  section  is  the  following  theorem,  which  is  an  extension  of  the 
Theorem  4.2  in  [6],  to  the  case  of  first  order  stochastic  PDF. 
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Theorem  2.2  Let  n>  m  be  fixed.  Assume  that 


■  the  cut-off  function  ^  satisfies  (i)-(iii)  for  some  k  >2,  and  (  £  , 

■  b,  a,  c  and  e  have  bounded  derivatives  up  to  order  (£  +  1), 

•  the  initial  condition  satisfies  go  £  , 

where  i  =  max(k,n). 

Then,  there  exists  a  constant  C  independent  of  both  h  and  c,  such  that 

{Elk,  -  9, <  C  {e*||9o||*.p +  (/>/£)"  ll9o|U,,}  . 

Proof.  Obviously 

9<  -  9? ■'  =  [9,  -  9<  ♦  C]  +  [9,  *  C  -  9,'''']  ■ 

First,  it  follows  from  Lemma  4.4  in  [6]  that 

Ikf  -  *  GII0.P  <  C  e*  |9il*.p 

provided  q,  £  U’*'''’.  Under  the  assumptions.  Theorem  1.3  gives 

{Elk,  -  9.  *  CIIS.p}''”  <  C  e*  {3k, I?, <  C  s*  ||9o|U,p  . 

On  the  other  hand,  using  the  change  of  variable  induced  by  the  diffeomorphism  ^o,' (  )’ 
it  holds  for  all  x  £  R™ 

9<  *  =  j  ^0.1(2)  90(2)  Ce(a:  -  dz 


-  =  £(9(x,  •)) 

i€/ 

with  g(x,  )  =  Eo,t  90  ■  Ce(^  ~  ^o,<)■  Therefore,  it  follows  from  Theorem  A.l  that  for  all 
X  £  R"* 

I9,  *  Ct(^)  -  9,*''(^)l  <C  h”  lg(x,  •)|„,] 

provided  g(x,  •)  €  VF"'',  n  >  m.  Moreover,  the  generalized  Leibniz  formula  yields 
k(^,  OU,.  <  E  /  D‘^qo{z)D-^Ux  -  ^oAmdx  , 

where /„  denotes  the  set  of  pairs  {a,fiiof  multi-indices  such  that  |q|  +  |^|  <  n,  and  Xa,9(  ) 
are  random  fields  involving  the  derivatives  of  5o,<(')  and  Ho,t(-)  up  to  order  n.  From  the 
technical  lemma  below,  it  follows  that 

/  k(x,  dx<C  Y,  {  /  dxV  {  f  Ixy  T)  D^qo(x]\^ 
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Making  use  of 

D-i-M  =  0"«f ) . 

taking  mathematical  expectation  on  both  sides,  and  raising  to  the  power  1/p  gives 

{e  /  \g(x,.)\l,  <  C  1  IKIk,  E  {  / E{|xU^)r  ko,,(x)]-‘'’-')} 

iz)%(x)rdx}'^’’ . 

From  estimates  in  Proposition  1.1,  it  holds 

sup  E{ixl,;9(x)r  <  00  . 

xeR" 

Therefore 

{E1|9,  »  G  -  KCh"  \eJ  |<?(x,.)|:,,  dx}’^' 

<  C  (h/e)"  IICIU,,  |l9ol|n,p  .  □ 

Lemma  2.3  Lei  f  &  L’’  and  g  €  L' ,  and  define 

/(x)  =  //(x)s(x-4o.,(*-))dx  . 

Then  I  €  L’’  and  in  addition 

{/l/(x)rdx}'^'  <  {/  |/(x)l'’  [Jo.,(x)l-<’’->dx}'"’  J  \9{x)\dx  . 

Proof.  Using  back  and  forth  the  changes  of  variable  induced  by  the  differomorphisms 
^o,((-)  and  ^0,' (■)>  and  the  Lemma  4.3  in  [6],  gives 

/(x)  =  //(4'.!(^))  9i^  ~  , 

and 

{y i/(x)rdx}'''  <  y  \mM)nJoA^o'{^r’’dy''  J \9(=^)\d^ 

<  [J  If  Mr  |do,,(x))-<'-»dx|‘^'’  j  \g(x}\dx  .  □ 
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3  Adapted  particle  approximation 

Consider  the  particle  approximation  (A.3)  for  the  initial  condition  Ho{dx) 

(lo(dx)  ~  fio(dx)  =  52“'  -  *’)  . 

t€/ 

where  the  particle  weights  {a',  t  g  7}  and  the  particle  locations  {x‘ ,  :  €  7}  are  defined 
in  the  following  way 

a'  =  =  f  (io{dx)  ,  x'  =  ~  [  xno{dx)  , 

JB'  O'  JB* 

depending  on  the  measure  fio(dx).  This  induces  the  following  particle  approximation  for 
the  solution  pi(x)  of  equation  (1.4) 

~  ti^{dx)  =  Hx  -  ^oAx'))  ■ 

i€l 

Parallel  to  the  Theorem  2.1  above,  the  following  error  estimate  holds  in  Sobolev  space 
with  negative  exponent. 

Theorem  3.1  Assume  that 

■  b,  a,  c  and  e  have  bounded  derivatives  up  to  order  3, 

•  for  all  i  &  I ,  the  set  B'  C  R”"  is  compact. 

Then  :iere  exists  a  constant  C  >  0,  such  that 

•  €/ 

where  ii’  =  po(B')  and  6,  =  b(B')  denotes  the  diameter  of  the  set  B'. 

PROO’  .  Let  4>  €  be  an  arbitrary  test  function.  Since 

t‘t,<l>)  =  /  <t>{ioAx))'^Ax)  Po(dx)  ,  {Pt’<»)  =  12“' , 

it  folk  vs  from  estimate  (A. 6)  that 

•€/ 
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with  g  =  <j>  o  lo,t  ■  Ho,(,  where  B'  denotes  the  convex  hull  of  B'.  The  generalized  Leibniz 
formula  yields 


l5|2,oo,fl  <  sup|Xo(a:) 

|a|<2 


<  E 

|al<2 


suplxa(-r)! 

x€fl 


[sup  \D''<i>(x) 

1  LxeR’” 


^  Ikib.oo  E  s«Pl>Co(x)|  , 

|o|<2 


where  Xo{')  are  random  fields  involving  the  derivatives  of  ^o,((  )  and  up  to  order  2. 

Therefore 


lkl|2,oo 


<  5E  E  ®“P  ko(i)l  <5?  a'  , 

ie/  |a|<2  l€fl* 


and 

sup  |Xo(t)|  a'  . 
zeS- 

From  estimates  in  Proposition  1.2,  it  holds 


E||/i,-p?|i-2,,  <iE  E  E 

<€/  |o|<2 


E 


sup  l.\o(a-)l 


<  C  [1  +  , 


for  some  p,  where  6i  =  6(B')  denotes  the  diameter  of  both  B'  and  its  convex  hull  B'.  so 
that 

E||p<-pf||-2..  <c^  +  .  □ 

•€/ 


References 

[1]  P.  FLORCHINGER  and  F.  LE  GLAND,  Time-discretization  of  the  Zakai  equa¬ 
tion  for  diffusion  processes  observed  in  correlated  noise.  Stochastics  and  Stochastics 
Reports  35  (4)  233-256  (1991). 

[2]  N.V.  KRYLOV  and  B.L.  ROZOVSKII,  Characteristics  of  degenerating  second- 
order  parabolic  Ito  equations,  J. Soviet  Math.  32  (4)  336-348  (1982). 

[3]  H.  KUNITA,  Stochastic  partial  differential  equations  connected  with  nonlin¬ 
ear  filtering,  in:  Nonlinear  Filtering  and  Stochastic  Control  (Cortona- 1 981) 
(eds.  S.K.Mitter  and  A.Moro)  100-169,  Springer-Verlag  (LNM-972)  (1982). 

[4]  H.  KUNITA,  First  order  partial  differential  equations,  in;  Stochastic  Analysis 
(Katata  and  Kyoto-1982)  (ed.  K.Ito)  249-269,  North-Holland  (1984). 


11 


[5]  H.  KUNITA,  Stochastic  Flows  and  Stochastic  Differential  Equations,  Cambridge 
University  Press  (1990). 

[6]  P.A.  RAVIART,  An  analysis  of  particle  methods,  in;  Numerical  Methods  in 
Fluid  Dynamics  (Como-1983)  (ed.  F.Brezzi)  243-324,  Springer- Verlag  (LNM- 
1127)  (1985). 

A  Particle  approximation  of  functions 

Consider  the  following  quadrature  formula  on  R" 

f  g(x)dx  g(x')  ,  (A.l) 

.€/ 

where  {t*  ,  i  €  /}  is  a  coordinate  grid  of  size  h  >  0.  /  =  Z”"  and  u;'  =  h”*  is  the  Lebesgue 
measure  of  the  m-dimensional  cube  B'  with  center  x'  and  edge  size  k.  For  all  g  £  C(R"‘), 
the  quadrature  error  associated  with  the  quadrature  formula  (A.l)  is  defined  by 

Ei(g)=  j  g(x)dx -u;' g(x')  ,  E(g)  =  ^E,{g). 

16/ 

The  following  estimate  is  proved  in  Raviart  [6] 

Theorem  A.l  There  is  a  constant  C  >  0  independent  of  h  such  that 

|£(<?)l  <  c  h”  |p|„.,  , 

for  all  g  6  H'"  *,  n  >m. 

Let  p{di)  be  a  probability  measure  on  R”*  having  a  continuous  density  q(x)  with 
respect  to  the  Lebesgue  measure,  i.e.  p{dx)  =  q{x)  dx.  With  the  quadrature  formula  (A.l) 
is  associated  the  following  particle  approximation  for  the  density  q(x) 

9(x)  dx  =  p{dx)  ~  /(di)  =  q(x')  8(x  -  x')  ,  (A.2) 

»€i 

SO  that,  for  any  test  function  <t> 

=  j  <t>{x)q{x)dx  ,  {p!',(f)  =  <l>(x')q{x')  . 

•'  •€/ 

The  following  result  is  proved  in  Ra'dart  [6] 

Theorem  A.2  There  is  a  constant  C  >  0  independent  of  h  such  that 

II//  -  //Nl-n.p  <Ch"  ||?||„,p  , 

for  all  q  £  VP"'/’,  n  >  m. 
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Proof.  From  Theorem  A.l,  it  holds 


\{n,<t>)-(,i\4>)\  =  \E(g)\<C  h"  IsU,,  , 

with  g  —  4)  ■  q  1  provided  g  G  VP"-',  n  >  m.  The  generalized  Leibniz  formula  and  the 
Holder  inequality  yield 

IffU.,  <  C  ||^||„.p,  |1<J|U,, , 

for  conjugate  p  and  p',  and  therefore 


sup 


|(p,^)-(pN^}| 

MUy 


□ 


Another  possible  approximation  is  to  consider  a  partition  {S' ,  ?  G  /}  of  R'",  and  to 
define  the  following  particle  approximation  for  the  probability  measure  p{dx) 

fi(dx)  ^  =  ^a*  6{x  -  j‘)  ,  (A. 3) 

«€/ 

where  the  particle  weights  {a',  i  G  /}  and  the  particle  locations  {x’ ,  t  G  /}  are  defined 
in  the  following  way 

a' t  p(B')  =  f  p{dx)  ,  /  xp(dx).  (A,4) 

Jb*  a  JB' 

depending  on  the  measure  /i(dx)  so  that,  for  any  lest  function  0 

{p,<l>)=  f  4>{i)  (i(dx)  ,  =  ^aV(x’)  . 

•'  le; 

For  all  <j>  G  C(R”’),  the  quadrature  error  associated  with  the  formula  (A, 3),  is  defined  by 

E'(4>)  =  /  4>(x)Mdx)  -  a'  4(x')  ,  £'(4)  =  ■ 

.6/ 

Parallel  to  the  Theorem  A, 2  above,  the  following  result  holds 
Theorem  A. 3  For  any  partition  {B' ,  i  G  /} 

■€/ 

where  a'  =  p{B')  and  Si  =  S(B’)  denotes  the  diameter  of  the  set  B' . 
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Proof.  Let  4>  6  be  an  arbitrary  test-function.  Using  Taylor  expansion  around 

the  point  x  =  x'  yields 

^(i)  =  4>(x')  +  (x  -  x')’  D<i>{x') 


+  (x  —  X*)’  (1  —  u)  D^(l>[ux  +  (1  —  u)x‘]  du|  (i  —  x‘)  , 

and  the  definition  (A.4)  gives 

=  J  {x  —  x‘)’  (1  —  u)  D^<i>\ux  +  (1  —  u)x’]  du|  (x  —  x’)  dx  . 


Therefore 


<  \  I^l2,cc,g.  X.  Ik  -  ^^(dx)  <  \  1^1,^  g.  6^,  a'  , 


where  B'  denotes  the  convex  hull  of  B'.  Then 

i(p,^)  -  =  \E\d>)\  <  i  “■  ’ 

and 


.6/ 


Iklb.oc  “  ’  ig; 


Ik  -  /II-...  =  sup  ° 


(A.6) 


Remark  A.4  If  the  partition  {Bi ,  i  e  /}  is  given,  with  S,  <  C  h  for  all  i  e  I,  then 

lk-/||-2.■<CA^ 

On  the  other  hand,  if  the  partition  {B, ,  i  6  7}  has  to  be  chosen  so  as  to  make  the 
quadrature  error  as  small  as  possible,  then  <stimate  (A. 5)  can  be  used  to  derive  the 
following  criterion 

6^  a'  =  c  for  all  i  €  7  . 

This  criterion  based  on  equidislrihution  of  the  local  quadrature  error,  has  the  following 
interesting  property 

•  a  set  with  a  large  mass,  will  be  split  into  some  smaller  subsets, 

•  conversely,  neighbouring  sets  with  small  masses,  will  be  packed  together  into  one 
single  set. 
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I  Models  of  Partially  Observed  Systems 

We  consider  partially  observed  systems  of  the  form 

f  dXt  =  b(X,)dt  +  (T{Xt)dWt 

(1) 

(  dY,  =  hiX,)dt  +  dV, 

where  {W, ,  <  >  0}  and  {VI,  t  >  0}  are  independent  Wiener  processes  of  appropriate  dimensions, 
with  covariance  matrices  I  and  R  respectively.  We  are  interested  in  the  state  estimation  problem, 
under  various  hypotheses  concerning  a  =  o  a'  and  R. 

Consider  first  the  two  extreme  cases  :  If  a  =  0  and  R  =  0,  we  are  dealing  with  an  observer 
problem  for  the  deterministic  system 

i  X,  =  b(X,) 

(2) 

[  2.  =  h{X,) 

At  the  other  extreme,  if  R  is  non-singular,  we  are  dealing  with  a  nonlinear  filtering  problem  for  the 
diffusion  process  (1).  We  can  also  easily  handle  the  following  intermediate  case  :  If  R  is  non-singular 
and  a  =  0,  we  are  again  dealing  with  a  nonlinear  filtering  problem  but  the  state  equation  is  now  an 

ODE 

j  X,  =  b(X,) 

\  dYt  =  h(X,)dt-\-dVt 

Let  us  point  out  that  the  solution  of  the  state  estimation  problem  is  radically  different,  depending 
on  whether  R  is  non-singular  or  identically  zero.  On  the  other  hand,  whether  a  is  non  singular, 
singular  or  identically  zero  only  affects  the  algorithms  to  be  used. 

Our  purpose  is  to  present,  for  each  of  the  three  main  cases  described  above,  a  solution  to  the  state 
estimation  problem,  and  to  suggest  some  numerical  approximation  procedures.  The  general  idea  is 
to  study  the  asymptotics  R  —*  0.  As  a  by-product,  we  expect  to  obtain  some  numerical  algorithms 
for  the  nonlinear  filtering  problem,  that  are  robust  when  the  non-singultu  matrix  R  is  small. 
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II  Solutions  to  State  Estimation  Problems 

We  assume  for  simplicity  that  b  6  C^(R'",R"')  and  h  €  Cj(R'",R’’),  unless  otherwise  stated. 
Let  us  begin  with  the  nonlinear  filtering  (NLF)  problem. 


□  When  R  is  non-singuliir,  the  Bayesism  approach  to  the  state  estimation  problem  is  to  compute  the 
unnormalized  conditional  probability  distribution  iit{dx)  of  the  state  Xt,  given  the  past  observations 
J’l  =  <7(y', ,  0  <  s  <  <).  By  definition 

(/i.,/)  =  Et[/(X.)  z,  I  y,] , 


for  any  test  function  /,  where 

Z:  =  expy‘h'iXr)R-UYr-^J^\hiXr)\l-,dT'^  and  Z,  =  Z°, 


and  is  the  reference  probability  measure.  The  probability  distribution  Ptidx)  satisfies  a  stochastic 
PDE  in  weak  sense.  Usually,  this  p.d.f.  h2is  a  density  w.r.t.  the  Lebesgue  metisure,  i.e.  ftt{dx)  = 
Pt{i)dx.  A  sufficient  condition  for  this  to  hold,  is  that  the  probability  distribution  Po{dx)  of  the 
initial  condition  h'ts  already  a  density  w.r.t.  the  Lebesgue  measure,  i.e.  poidx)  =  po(x)dx.  We 
will  assume  that  po{x)  >  0  for  all  x  €  R”*.  The  unnormalized  conditional  density  Pt(x)  is  the  unique 
solution  of  the  Zakai  equation 

dp,  —  L'p,  dt  +  p,h'R~'  dY,  ,  (3) 


with  initial  condition  po(a:),  where  L‘  is  the  formal  adjoint  of  the  second-order  partial  differential 
operator 


L  = 


associated  with  the  SDE 

dX,  =  b{X,)dt  +  a{X,)dW,  . 


(4) 


□  If  in  addition  r  =  0,  then  the  Zakai  equation  (3)  becomes  a  first  order  stochastic  PDE.  for  which 
a  representation  result  is  available  in  terms  of  the  flow  $i(a;)  associated  with  the  ODE 

X,  =  b(X,)  .  (5) 


Actually,  define 

Jt-,(x)  =  det[  ]  =  exp  div  b(^r-,(x))  dr  J  , 

5.,,(x)  =  exp  h'(^r-.{x))R-^  dYr  -  i  |h($._.(x))|J,-.  dr}  . 

In  this  case,  the  unique  solution  of  the  Zakai  equation  (3)  satisfies 

Pt{9,-,{x))  ■  J,-,(x)  =  5,,,(x)  •  p,(x)  ,  (6) 

or  equivalently,  introducing  the  logarithmic  transform  W,(t)  =  —  logp((x) 

lU,($,_.(x))  -  log  J,_,(x)  =  W,(i)  -  log£,,,(x)  .  (7) 
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We  turn  now  to  the  observer  problem. 

Let  {x* ,  0  <  t  <  T}  denote  the  true  state  trajectory  producing  the  available  observation  tra¬ 
jectory  {2( ,  0  <  /  <  T}.  The  idea  is  to  build  an  observer  by  considering  the  limit  of  a  sequence  of 
nonlinear  filtering  problems  with  noise  covariances  going  to  zero.  Two  different  cases  are  possible 

•  Introduce  small  noises  of  similar  intensities  in  both  the  state  equation  and  the  observation,  i.e. 
set  a  =  el  and  R  =  el, 

■  Introduce  a  small  noise  in  the  observation  only,  i.e.  set  a  =  0  and  R  =  el. 


□  In  the  first  case,  it  is  proved  in  Jaunes  [2]  that 

-elogpj(i)  m;(i) 


in  probability  uniformly  on  compact  subsets  of  x  €  R"*,  where  up  to  an  additive  constant  independent 
of  X,  m[(x)  is  the  unique  solution  of  the  Hamilton-Jacobi  equation 


at 


dm[ 


dx 


+  b- 


dm', 

dx 


V,  =  0 


(S) 


with  initial  condition  mfl(x)  =  0,  in  the  viscosity  sense,  where 

=  \\zt  -  . 

In  addition,  mj(x)  is  the  value  function  associated  with  the  following  control  problem.  Introduce 
first  the  action  functional  ^ 

^(4)  =  ?/  t-b{i,)\Us 

if  ^  €  <^([0,7] ;  R’")  is  absolutely  continuous,  and  /,(^)  =  +00  otherwise.  Define  also 

F,(o  =  /'  v;(6) ds  =  i  f  |2,  -  ds  . 

Jo  Jo 


Then 


m:(x)  =  inf  {7,(0 +  f.(0  :  e<  =  x}  . 

Clearly  mj(x)  >  0  and  m',(x’)  =  0  for  the  true  state  trajectory,  and  we  define  our  observer  as  the  set 

x|  =  argmin  m',{x)  =  {x  g  R"  :  mj(x)  =  o|  .  (9) 

icR”* 


Obviously  x'  g  ij  for  all  <  >  0.  It  is  proved  in  James  [3]  that,  provided  the  deterministic  system  (2) 
is  observable  on  [0,  T]  (i.e.  the  map  lo  {z, ,  0  <  s  <  t)  is  injective),  the  set-valued  observer  (9) 
is  actually  a  finite-time  observer  (FTO)  on  (0,  /]  (meaning  that  xj  is  defined  in  terms  of  a  recursive 
systeiji  with  the  property  that  x',  =  {x*}  for  all  t  >  T). 


3 


□  In  the  second  case,  it  follows  from  equation  (7)  that 


-£logP5(i)  =  '^t(x)  , 


in  probability  uniformly  on  compact  subsets  of  i  £  R”*,  where  up  to  an  additive  constant  independent 
of  X,  mt{x)  is  given  by 

Tnt(^,(x))  =  f  V,(<I>^(i))ds  or  m,(i)  =  f  V,{^~2,ix))  ds  , 

Jo  Jo 

i.e,  m,(x)  =  where  is  the  unique  solution  of  the  ODE  (5)  ending  in  x  at  time  t.  In 

addition,  mt{x)  is  the  unique  solution  of  the  linear  first -order  PDE 


drrit  ,  dm, 


-  v,  =  0  , 


satisfying  the  initial  condition  mo  =  0.  Just  as  above,  it  is  clear  that  m,(x)  >  0  and  m,(x')  =  0  for 
the  true  state  trajectory,  and  we  define  our  observer  as  the  set 

X,  =  argmin  m,(i)  =  |x  €  R”"  ;  m,{x)  =  o|  .  (101 

leR™  ^ 


Here  again,  it  is  obvious  that  x"  6  x,  for  all  <  >  0,  and  in  addition  the  set-valued  observer  defined 
by  (10)  is  actually  a  FTO  on  [0,7’],  provided  the  deterministic  system  (2)  is  observable  on  [0.  Tj. 

Note  that  mt{x)  =  F((4‘"^)  where  lt{C’^)  =  0  (i-C-  solves  the  ODE  (5)  exactly)  and  =  x. 
whereas  in  the  definition  of  mj(x),  a  penalty  /,(^)  is  put  on  those  trajectories  ^  that  do  not  solve 
the  ODE  (5),  This  is  a  less  severe  requirement,  and  is  reflected  in  the  relation  m',(i)  <  nijjx).  Note 
however  that  x,  =  xj.  This  is  the  set  of  those  points  that  are  indistinguishable  from  the  true  state  x'. 
In  conclusion,  the  observer  (10)  is  more  precise  than  the  observer  (9),  whereas  the  latter  is  expected 
to  be  more  robust  w.r.t.  modeling  errors. 


Ill  Numerical  Approximation 

In  this  section,  we  restrict  ourselves  to  the  situation  where  the  state  satisfies  an  ODE.  in  which  ca.se 
the  solution  to  the  NLF  problem  is  given  by  (6),  where  R  is  non-singular,  and  the  correspondine 
Fl'O  is  given  by  (10).  where  /?  =  0. 

Concerning  the  approximation  of  the  NLF  (6),  we  wish  to  compute  an  approximate  normal¬ 
ized  conditional  density  pf'*(i)  (where  A  and  6  denote  the  time  discretization  step  and  the  sjiace 
discretization  step  respectively)  with  the  following  property 

(*)  cus  A,6  ],  0 

^  Ir’”  for  all  t  >  0  , 

where  c,  is  a  normalization  constant. 

Concerning  the  approximation  of  the  FTO  (10),  our  approach  is  to  build  a  family  xf'^  with  the 
following  property 
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(**)  if  the  deterministic  system  (2)  is  observable  on  [0,Tj,  then  as  A,  <5  i  0 

foralH>r. 

A  necessary  and  sufficient  condition  for  (**)  to  hold  is  dist(X[,y^],  Xi)  — *  0  as  A,<f  i  0.  The  approxi¬ 
mate  observer  will  be  defined  in  terms  of  an  approximate  value  function  tn^  (x),  i*e. 

ff’'  =  {x  €  R”*  :  mf-‘(i)  <  c^'*}  , 

and  a  sufficient  condition  for  (**)  to  hold  is  c^-‘  1  0  and  m|^4,(x)  -  m,(x)  uniformly  on  compact 
subsets  of  R™,  as  A,  6  j  0. 


Vk 


Time  Discretization 

Consider  a  uniform  partition  0  =  to  <■<<*<•••  interval  [0,oo),  with  time  step 

^  —  tk  —  tk-i-  The  first  step  is  to  sample  the  available  observation  trajectory. 

The  nonlinear  filtering  problem.  If  noisy  observations  {K, ,  <  >  0}  are  available,  we  first  build  tlie 
following  sequence  of  compressed  observations 

and  we  use  the  approximate  model 

f  X,  =  b(X.) 

J  ill) 

1  =  + 

where  (tif  ,  ifc  =  1, 2,  •  •  • }  is  a  Gaussian  w'hite  noise  sequence  with  covariance  matrix  /?/A 

The  solution  of  the  NLT  problem  for  the  approximate  model  ( 1 1 )  is  given  in  terms  of  the  a  pnon 
and  a  posteriori  conditional  probability  densities  defined  by 

pf  i{x)dx  =  €  dx  I  ^i^j)  and  pf(x)dx  =  P[Xi^  €  dx  \  . 

respectively,  where  =  .r(yP,- •  •  ,yf  )■  The  transition  from  pt,(x)  to  pf{x)  is  divided  into  two 
steps 

■  prediction  step  :  Transport  by  the  flow  gives  =  To.  Pk-M)  where  {7,  ,  f  >  0}  is  the 

semigroup  Associated  with  the  linear  first-order  PDE 


dt 


r  =  T  P( 


An  explicit  solution  is  available  for  tnis  equation 

or  equivalently 

for  all  Borel  set  A  C  R”"- 


(13) 


(141 
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(15) 


•  correction  step  :  According  to  the  Bayes  formula 

*  2 

where 

4'f(i)  =  exp{-iA|j/f-h(i)|^_i}  , 

is  the  likelihood  function  for  the  estimation  of  Xi^  in  the  approximate  model  (11),  based  on  the 
observation  alone,  and  c*  is  a  normalization  constant. 

Introducing  the  logarithmic  transform  =  —  logpf(x),  it  follows  from  from  (13)  and  (15) 

that 

<(x)  -  \0gM<P2.'{x))  =  -logc*  +  +  iA  lyf  -  h{x)\l-.  .  (16) 

The  observer  problem.  If  perfect  observations  {zt,t  >  0}  are  available,  i.e.  R  =  0,  we  can  simply 
use  Zk  =  and  our  model  becomes 


f  X,  =  h(X,) 

I  =  Hx,,) 


(!') 


Introducing  the  asymptotics  R  —  el  in  the  NLF  problem  and  sending  e  to  zero,  it  follows  from 
equation  (16)  that 

-£logpf’'(2')  =  mf(x)  . 

in  probability  uniformly  on  compact  subsets  of  x  €  R”*,  where  m^(x)  satisfies  the  following  relation 

mf  (x)  =  mf_,($^'(a^))  +  A  (x)  . 

where 

and  =  ^  f  z,ds  =  ^  h(X,)ds  +  r*  . 

A  ./(t-i  A  7!*., 

It  is  clear  that  mf  (x)  >  0.  However,  because  the  averaged  observation  zf  used  in  the  definition  of 
Tnf{x)  is  different  from  the  actual  observation  x*,  we  have  14^ (x’^)  5^  0  in  general  for  the  true  slate 
trajectory.  Therefore,  we  decide  to  use  the  actual  observation  Zk  in  the  definition  of  mf  (x).  instead 
of  the  averaged  observation  xf ,  i.e. 

mf(x)  =  mf.,(4>^'(x))-|- A  V4(x)  .  (iSi 

where 

Vk{x)  =  \\zk-h(x)\^  . 

This  relation  can  be  divided  into  two  steps 


prediction  step  :  Transport  by  the  flow  gives  rn9_i(x)  =  ttt*-i(i)  where  {5( ,  t  >  0}  is  the 

*  7 

semigroup  cissociated  with  the  linear  first-order  PDE 


dmt  drrit 


=  0  . 


(19) 


An  explicit  solution  is  available  for  this  equation 


(20) 
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•  correction  step  :  The  contribution  of  the  new  observation  Zk  to  the  approximate  value  function 
is  given  by 

(i)  =  mf_i(jr)  +  A  V*(i)  . 

We  note  that 

=  and  mf {:r)  = , 

where  is  the  unique  solution  of  the  ODE  (5)  ending  in  i  at  time  t,  and  the  functional  F^{() 
satisfies  for  all  {  €  C((0,T] ;  R”*) 

=  Fktk(0  +  A  =  A  {V,(6.  )  +  •••  +  . 

Now  it  is  clear  that  (i)  >  0  and  =  0  for  the  true  state  trajectory,  and  we  define  our 

observer  as  the  set 

xf  =  argmin  mf(x)  =  (i  G  R”*  :  m^(x)  =  o|  .  (21) 

Obviously  G  i*  for  all  k,  and  one  can  verify  using  the  explicit  formulas  that 

uniformly  on  compact  subsets  as  A  i  0,  with  the  consequence  that  property  (**)  holds  for  this 

discrete-time  approximation. 


Model  Approximation  and  PDE  Discretization 

To  obtain  computable  algorithms,  it  is  necessary  to  discretize  the  linear  first-order  PDE  (12)  or  (19) 
involved  in  the  prediction  step.  Generally  speaking,  two  classes  of  methods  can  be  used  :  in  the  finite 
difference  approximation  (FD)  a  fixed  bounded  grid  is  used,  and  partial  differential  operators  are 
approximated  by  finite  differences  on  grid  points,  whereas  in  the  flow-based  approximation  (FLOW) 
the  explicit  representation  (13)  or  (20)  is  used  to  move  grid  points  (or  alternatively  cells)  along  the 
flow  of  the  ODE  (5). 

A  Finite  Difference 

A  finite  difference  nonlinear  filter.  To  derive  a  finite  difference  algorithm,  we  must  first  constrain 
the  nonlinear  filtering  problem  to  a  bounded  domain.  Let  D  C  R"*  be  a  m-dimensional  open  cube. 
After  Dupuis-Ishii  [1],  we  constrain  the  ODE  (5)  to  the  convex  set  D  as  follows.  For  x  G  dD,  let 
i/(x)  =  {  t'  G  R*"  :  |i^|  =  1,  {ie,x  —  z)  <  0  for  all  z  G  D  }  denote  the  set  of  inward  unit  normals. 
For  T  G  5,  V  G  R"*,  the  projection  5r(x,r)  of  the  velocity  vector  u  at  i  is  given  by  r  if  j  G  D. 
or  ti  +  [(t),  — r)}  V  0]  i/*(x,t')  if  X  G  dD,  where  i/*(x,v)  is  an  element  of  u(x)  which  maximizes 
(i),  — z/),  1/  G  t'(x).  Define  then  6(x)  =  x(x,6(x)),  x  G  D.  By  Theorem  5.1  of  Dupuis-Ishii  [1],  there 
exists  a  unique  absolutely  continuous  solution  of  the  constrained  ODE 

i.  =  b(U  a.e.0<s<<  (22) 

satisfying  (o  =  x  &  D. 

A  finite  difference  algorithm  is  obtained  using  a  Markov  chain  scheme  similar  to  those  described 
in  Kushner  [5].  Let  RJ"  denote  a  coordinate  grid  of  size  6  >  0.  W^e  define  a  system  of  neighborhoods 


Ns(x)  =  {  z  6  R™  :  2  =  I  or  z  =  a;  ±  6ei  for  some  t  =  1, . . .  ,m  }  for  i  6  R”,  where  e,  6  R"* 
denotes  the  j-th  unit  vector.  We  define  next  =  D  n  R^,  =  {  x  E  :  Ni{x)  C  ), 

and  dD^  =  \  D^.  We  define  the  jump  intensity  matrix  Ls(x,z)  of  a  pure  jump  Markov  process 

1  ^  ^  0}  taking  values  in  by 


Li{x,z) 


-|6(i)|i/^  if  2  =  I, 

bf{x)/6  if  2  =  ar  ±  6e,  and  i  =  1, . . .  ,m 
0  if  2  ^  Ni{x), 


(23) 


with  the  notation  |u|i  =  |ui|  +  •■  ■  +  for  any  u  =  (uj,  -  ■  ■  ,Um}-  If  we  use  an  implicit  time 
discretization  scheme,  we  obtain  the  finite  difference  equation 

pf’^x)-A  X:  L,'(i,2)pf’^(2)  =  c*.4.f(i).pf:^,(x),  (24) 

jeN<(x) 


for  X  £  and  k  =  1,  •  ■  where  c^  is  a  normalization  constant,  and  the  initial  condition  p^’^(j)  is 
a  suitable  approximation  of  the  density  po(x)-  This  relation  can  be  divided  into  two  steps 


•  prediction  step  :  Transport  by  the  flow  gives 

[/-AL;]p^:‘,(x)  =  pt:^,(x). 

•  correction  step  :  According  to  the  Bayes  formula 

pf-V)  =  c*-'I'i'{x)-p,^'.(x), 

*  2 

where  c*  is  a  normalization  constant. 

The  following  result  is  proved  in  Kushner  [5]  using  weak  convergence  A'*  =>  A',  as  6  i  0. 


Theorem  1  Property  (*)  holds  for  the  finite  difference  nonlinear  filtering  algorithm. 


A  finite  difference  observer.  To  derive  a  finite  difference  algorithm,  we  still  need  to  constrain  the 
observer  problem  to  a  bounded  domciin.  However,  because  we  are  going  to  approximate  (18).  we 
must  consider  the  ODE  (5)  as  running  backward  in  time,  before  we  constrain  it  to  the  convex 
set  D.  We  use  the  same  definition  as  above  for  the  set  i/(i)  of  inward  unit  normals.  For  x  € 
D,  V  E  R"*,  the  projection  tr(x,v)  of  the  velocity  vector  v  at  i  is  now  given  by  r  if  x  6  D.  or 
V  +  [{f,  ly'ix,  v))  V  0]  v*(i,  u)  if  X  €  dD,  where  i/*(x,  v)  is  an  element  of  v(x)  which  maximizes  [i\  n). 
v  €  i'(x).  Define  then  6(x)  =  — 7r(x,  — 6(x)),  x  E  D  By  Theorem  5.1  of  Dupuis-Ishii  [l]  again,  there 
exists  a  unique  absolutely  continuous  solution  f  of  the  constrained  ODE 

=  6«.)  a.e.  0<s<<,  (2.D; 


satisfying  =  x  E  D. 
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Select  0  £  C(R"*)  non-negative,  0  =  0  on  D'  C  D,  with  D'  n  dD  =  0,  and  0  >  0  on  dD.  Nov.- 
define  the  value  function  for  x  €  D,  t>0  hy 


m.(i)  =  0{io)  +  j[V.(6)  +  0U.)]  ds  , 


where  ^  is  the  solution  of  the  constrained  ODE  (25).  Then  mt(x)  is  the  unique  viscosity  solution  of 
the  Hamilton-Jacobi  equation,  see  Lions  [6] 

^  +  =  0  inDx(0,S], 

(27) 

dfn 

—V  ■  =0  on  dD  X  (0,  S]  , 

ax 

satisfying  the  initial  condition  mo(x)  =  0(x)  for  x  €  D.  In  addition,  m  satisfies  in  the  viscosity  sense 

^+^^-V,-;9  =  0  ondDx(0,S].  (28) 

Define  the  corresponding  observer  as  the  set 

f(  =  argmin  m,(x)  =  €  R"*  :  mi(x)  =  o}  .  (29) 

reR"  ^ 

Let  J  =  {lo  6  D'  :  $,(xo)  €  D',  0  <  s  <  5}.  If  xj  6  I,  then  x*  £  i,  for  all  0  <  <  <  S,  and 
the  observer  (29)  defines  a  FTO  provided  the  deterministic  system  (2)  is  observable  on  [O.r].  see 
James  [4], 

We  again  use  a  Markov  chain  finite  difference  scheme.  However,  we  discretize  the  boundary 
equation  (28)  rather  than  the  boundary  condition  in  (27).  We  use  the  same  definition  as  above  for 
the  grid  R™,  the  system  -of  neighborhoods  N((x),  and  the  subsets  D*,  and  dD^  =  D’’  \  of 
the  grid  R™.  Assume  that  v  =  6/ A  is  a  fixed  real  number,  indicating  the  “speed”  of  the  algorithm, 
satisfying 

u  >  max  |fe(x)|i  ,  (30) 

We  define  the  transition  probabilities  7r^'^(x,2)  =  ^  I  ^  ®  backward  Markov 

chain  ,  k  =  [S/A], . . . ,  0}  by 

f  1  -  t6(x)|i/u  if  2  =  X, 


7r^'‘(x,2)  =  {  bf(x)/v 


if  z  =  X  ±  be,  and  1  =  1, 


(0  if  2  ^  N((x)  . 

Note  that  j  Cf =  x]  =  -A  6(x). 

If  we  replace  $^'(x)  in  (18)  by  the  state  of  the  backward  Markov  chain  starting  at  =  x 
and  take  expectations,  we  obtain  the  finite  difference  equation 

E  ’r^-‘(x,^)»Rf-,(^)  +  A[F*(x)  +  /?(x)],  (32) 

for  X  £  D*  and  k  =  1, . . . ,  [S/A],  with  initial  condition  m^  *(x)  =  0(i).  This  relation  can  be  divided 
into  two  steps 
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•  prediction  step  :  Transport  by  the  flow  gives 

•  correction  step  :  The  contribution  of  the  new  observation  Zk  to  the  approximate  value  function 
is  given  by 

f”*  ■*(^)  =  "‘f (^)  +  ^  [^(i)  +  0(x)]  ■ 

*  2 

The  finite  difference  observer  set  is  defined  by 

if'*  =  tirgmin  mf'*(i)  .  (33) 

leC' 

Obviously,  there  is  no  reason  for  this  approximate  observer  to  satisfy  the  non-asymptotic  consistency 
property  :  in  general  we  can  not  guarantee  that  ij^  €  if'*. 

The  following  result  is  proved  in  James  [4]. 

Theorem  2  //ij  6  I,  then  property  (**)  holds  for  the  finite  difference  observer  algorithm. 

Remark  3  It  is  also  shown  in  [4]  that  under  additional  regularity  assumptions  dist(fi^^^]. x")  = 
0{\/6)  as  8  10. 

Remark  4  The  speed  constraint  (30)  which  appears  in  the  finite  difference  observer  algorithm  is 
actually  a  stability  condition  for  the  explicit  time-discretization  scheme  used  in  (32).  From  the 
probabilistic  point  of  view,  it  ensures  that  (31)  defines  transition  probabilities.  We  do  not  need  a 
similar  constraint  for  the  finite  difference  NLF  algorithm,  because  we  are  using  there  an  implicit 
time-discretization  scheme. 

B  Flow-Based  Approximation 

Let  us  first  describe  the  approximate  model  we  are  going  to  use. 

We  assume  that  at  each  time  f*,  a  partition  ,  i  €  /*}  of  the  state  space  R"  is  given,  and  we 
define  the  discrete  /^-valued  state  n*  by  the  relation 

{n*  =  i}  =  {.Y,.  €  Bi}  .  (3-1) 

The  idea  behind  our  approximation  is  to  suppose  that,  at  each  step  of  the  algorithm,  any  information 
(e.g.  probability  distribution,  likelihood  function,  value  function)  about  the  continuous  state  A,,  is 
immediately  compressed  into  some  information  about  the  discrete  state  n^.  We  can  think  of  memory 
constraint  as  a  justification  for  this  compression  mechanism.  As  a  consequence,  whenever  information 
is  needed  about  the  continuous  state  Xt^,  it  has  to  be  deduced  from  the  corresponding  information 
about  the  discrete  state  nj,  resulting  in  compression  error. 

Making  explicit  use  of  the  flow  associated  with  (5),  we  have 

e  B;}  =  6  4>Z'(Bi)}  =  U  e  Bl_,  n  <l.i'(Bi)}  .  (35) 


10 


where 


=  {i  e  /*-!  :  Bi_,  n  #  0}  , 

provided  ^  solves  the  ODE  (5).  Notice  that  in  generril  the  set  has  finite  cardinality. 

Various  possible  choices  are  available  for  the  partitions,  e.g. 

•  =  /*_!  =  /*  and  Bl  =  for  all  i  €  /*.  In  this  case  =  n^_],  i.e.  the  discrete  state 

process  is  constant  over  time,  but  the  sets  Bl  can  become  very  complicated  after  some  .deps. 

•  /f  =  =  /*  and  Bl  =  Bl_i  for  all  t  €  7*.  In  this  case,  the  partition  is  constant  over  time, 

but  updating  the  discrete  state  can  be  cumbersome. 


Between  these  two  extreme  cases,  a  trade-off  has  to  be  found  in  order  to  reduce  the  computational 
burden  of  updating  both  the  partition  and  the  discrete  probability  distribution  :  Bl  should  both  be 
“close”  to  and  have  a  simple  geometry. 

A  flow-based  nonlinear  filter.  According  to  our  approximation  approach,  we  introduce  the  discrete 
a  priori  and  a  posteriori  conditional  probability  distributions 


=  P{X^,  €  Bl  I  yt,)  and  p',  =  P(X\  e  Bl  |  yf) , 
respectively,  where  again  =  <r(yf ,  •  •  •  )-  Making  use  of  (35)  transport  by  the  flow  gives 

K-I  =  L  e  Bi_,  n I  ytx) 

,g/f  [it  MBl.yni^i^'iBi)) 


L  J  pU.) 

^  ^  y  MBl.,nt^-^\Bl)) 

/.fen 


dx 


Next,  according  to  the  Bayes  formula 

h'k  =  Ck  /  'iif(x)pf_,(x)dx 


-  Ci  •  pI^i  •  maxtltf  (t)  , 


where  ct  is  a  normalization  constant.  This  approximation  can  be  justified  in  the  small  noise  case, 
using  the  Laplace  asymptotic  formula. 
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To  obtain  a  computable  algorithm,  we  introduce  new  discrete  probability  distributions  ^  and 
and  the  corresponding  densities 

=  and  =  m’JA(BL)  iff  i  €  BL  . 

We  then  define  the  transition  from  {/i't-i ,  «  ^  ^*-1}  to  {/ij  ,  2  P  /*},  by  the  following  two  steps 
■  prediction  step  :  Transport  by  the  flow  gives 


,  A(Bi_,  n<l.x’ (Bi)) 

P*-i  ' 


KBi. 


■  correction  step  :  The  contribution  of  the  new  observation  is  given  by 

/2i  =  c*  -B-  ,  (36) 

where  c*  is  a  normalization  constant,  and 

B'k  =  max^f(i)  , 

xeB; 

is  the  generalized  likelihood  function  for  the  estimation  of  nl  based  on  the  observation  t/f  alone. 


Theorem  5  In  the  case  =  /‘,  let  {B^,  i  6  /^}  denote  a  finite  partition  of  a  bounded 

domain  D  with  diam(Bo)  <  S.  Then  property  (*)  holds  for  this  flow-based  nonlinear  filtering  algo¬ 
rithm. 


A  flow-based  observer.  According  to  our  approximation  approach,  we  introduce  the  0  priori  and  a 
posteriori  discrete  value  functions 

mLi  =  inf  :  X  €  Bi}  and  =  inf  {Ff  :  x  €  B^}  ,  (37) 


respectively.  Making  use  of  (3-5)  transport  by  the  flow  gives 


inf  inf  {Ff_,(C'‘--")  :  x  €  B^.,  n  «l>i’(B;)}  > 


inf 


Next,  by  definition  of  the  functional  F^((), 


mi  =  inf{F*^(?“-)  +  AK*(x) 


:  X  6  Bi}  >  +  A 


inf  V'i(x)  . 


Thus  the  discrete  value  functions  satisfy  difference  inequalities.  Unfortunately,  this  does  not  give 
a  recursive  mechanism  for  computation.  Instead,  we  introduce  new  discrete  value  functions  in\_i 

and  mJt,  and  the  corresponding  value  functions 

iff  X  e  Bi  . 

We  then  define  the  transition  from  ,  i  €  /i_i}  to  {m'^,  t  e  /*}  by  the  following  two  stejjs 
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•  prediction  step  :  Tr«inspcrt  by  the  flow  gives 


ml  1  =  inf  mi  ,  . 

•  correction  step  :  The  contribution  of  the  new  observation  is  given  by 

ml  =  m'.i  +  A  inf  Vl(i)  . 

2  r€Bi 

By  construction,  it  is  clear  that  >  0  and  m^’*(i*^ )  =  0  for  the  true  state  trajectory,  and  wc 

define  our  observer  as  the  set 

=  argmin  =  {x  6  R”*  :  mf'^(x)  =  o}  ,  (38) 

leR™ 


or  equivalently 


=  U  n  =  {t  e  It  ■■  rn',  =  o}  . 


By  an  inductive  comparison  argument,  it  is  easy  to  show  that  rrif’^(x)  <  m^(x),  w'ith  the  consequence 
that  xf  C  2* ’*•  Therefore,  £  x^’‘  . 

Theorem  6  In  the  case  It  =  It-i  =  5],  =  for  all  i  €  /*.  let  {B'q  ,  i  6  dinntt 

a  finite  partition  of  a  bounded  domain  D  with  diam(fiQ)  <  6.  If  xj  e  D,  then  property  (le*)  will  hold 
for  this  flow-based  observer  algorithm. 

As  noticed  in  James  [3],  the  only  thing  that  matters  is  the  argmin  set,  not  the  value  function 
itself.  This  remark  can  be  used  to  design  a  simplified  algorithm  for  the  construction  of  the  set¬ 
valued  observer  (38).  We  introduce  the  piecewise-constant  logical  value  functions  mf  *(x)  taking 
values  TRUE  or  FALSE,  and  defined  iteratively  by  the  following  relations 

=  V  ^1-1  ' 

rn\  =  ml_i  aFj  . 


TRUE  if  inf  V1(.t)  =  0 
reBL 


FALSE  otherwise 


It  is  clear  that  ml  =  TRUE  iff  ml  =  0,  so  that  an  equivalent  expression  for  the  set-valued  observer  (38) 
is  given  by 

=  U  51  with  It  =  6  It  ■■  ml  =  true}  . 

•«Ji: 


Corollary  7  Under  the  assumptions  of  Theorem  6,  property  (**)  will  hold  for  the  simplified  algo¬ 
rithm. 
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Remark  8  In  the  particular  case  where  and  B]^  =  for  all  i  €  the 

algorithms  exhibit  a  parallel  structure  explicitly.  On  the  other  hand,  these  algorithms  ciss’jnie  that 
certain  calculations  can  be  made  exactly.  This  is  not  always  possible,  in  which  case  one  would  have 
e.g.  to  discretize  the  ODE  (5)  or  use  the  following  approximations 

jg.P{x)dx  ~  p(xi)  ~  , 

inf  m(ar)  ~  m(ii)  inf  ^(i)  ~  Vt{xl)  , 

*€^4  r6B; 

where  x^  is  some  point  in 
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rV  Numerical  Experiments 
A  A  One  Dimensional  Example 

We  consider  a  one  dimensional  model  with 

5(i,t)  =  — 0.2i  +  0.8cos(2.5t)  k(x)  =  sgn(i)  . 

Even  though  the  observation  function  is  discontinuous,  the  convergence  results  are  still  valid,  se*/ 
James  [4].  The  location  of  the  trajectory  is  determined  at  the  first  time  f*  it  crosses  the  origin,  su 
the  system  is  observable. 

Figure  1  (below)  shows  results  for  the  simplified  (logical)  fi.iw-based  observer  algorithm,  wi  li 
the  choice  =  /*_,  =  and  Bl  =  $^(^*-1)  for  all  i  6  /*.  '1  =  0.05,  6  =  0.02,  and  noise-fr-'e 
observations.  The  estimate  Xt  is  a  one-dimensional  set  for  tin!"  t  before  <*,  and  zero-dimensioi'al 
after  this  time. 

Figure  2  illustrates  the  numerical  results  obtained  from  the  !  lite  difference  nonlinear  filter  algo¬ 
rithm.  Here,  A  =  0.05,  6  =  0.005,  R  =  10“'*,  and  the  observation  path  was  noise-free.  Notice  the 
jumps  in  the  conditional  mean  trajectory  and  the  peaking  of  the  conditional  density  function  each 
time  the  origin  is  crossed.  Numerical  viscosity  causes  the  density  to  spread  between  these  times. 

Figure  3  shows  results  for  the  finite  difference  observer  algorithm,  with  S  =  0.02,  A  =  0.0198. 
V  =  1.01,  and  noise-free  observations.  The  plot  of  the  value  function  clearly  shows  the  valley 
containing  the  state  trajectory. 

Figure  4  shows  results  for  the  flow-based  nonlinear  filtering  algorithm,  with  the  choice  /f  =  /^_i  = 
and  Bl  =  ^A(fii_i)  for  all  t  g  /*,  A  =  0.05,  6  =  0.02,  E  =  lO"*.  and  noi.se-free  observations. 
Marginals  for  the  conditional  density  are  shown  for  times  befor.  and  after  time  t’. 


Figure  1.  Flow-based  observer,  simplified  algorithm. 
State  X,  and  estimate  x,  trajectories. 


I 


j 


X 


Figure  2.  Finite  difference  nonlinear  filter. 

i)  State  X,  and  conditional  mean  E(X(|3^,]  trajectories;  (b)  Conditional  density  function 
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B  A  Four  Dimensional  Example 


We  consider  here  the  problem  of  target  motion  analysis,  which  is  to  estimate  the  trajectory  (positioi 
and  velocity)  of  a  target  moving  at  constant  speed  along  a  straight  line  at  the  surface  of  the  sea. 
We  suppose  that  bearings-only  measurements  are  available  in  discrete  time,  taken  from  a  moving 
observation  platform.  If  the  observation  platform  itself  moves  at  constant  speed  along  a  straight  line, 
the  problem  is  non-observable.  However,  as  soon  as  the  observation  platform  changes  its  course,  the 
problem  becomes  observable.  Assuming  that  the  direction  of  motion  of  the  target  is  known,  which  is 
true  in  the  case  of  perfect  observations,  we  c«in  reduce  the  problem  to  three  dimensions.  The  state 
vector  is  A”  =  (x,  y,  v)  and  the  state  equation 

it  =  Vt  yi  =  0  tij  =  0  . 

The  observation  function  is 

x-x^ 

h{x,y,v,t)  =  arctanj  ^  ]  - 

where  is  the  (known)  position  of  the  observation  platform  at  time  t. 

For  this  problem,  the  flow  is  known  explicitly,  and  the  flow-based  algorithms  (for  both  the 
nonlinear  filtering  and  the  observer  case)  are  explicitly  parallelizable.  A  variant  of  the  flow-based 
NLF  algorithm  has  been  implemented  at  INRIA  on  a  16K  Connection  Machine  from  Thinking 
Machines  Corporation.  Numerical  experiments  have  been  carried  out,  using  noisy  observations  with 
standard  deviations  ranging  between  one  and  five  degrees.  The  goal  is  to  find  better  maneuvers,  and 
to  investigate  them  off-line.  The  filter  is  not  intended  to  be  run  in  real-time  on  the  ship. 

Acknowledgment:  Research  supported  by  NSF  Grant  “USA-France  (INRIA)  Collaborative  Re¬ 
search  in  Stochastic  Control”  NSF-iNT-89-06965. 


References 

[1]  P.  DUPUIS  and  H.  ISHII,  On  Lipschitz  continuity  of  the  solution  mapping  to  the  Skorokhod 
problem,  with  applications,  Stochastics  and  Stochastics  Reports  35  (1-1-2)  31-62  (1991). 

[2]  M.R.  JAMES,  Asymptotic  Nonlinear  Filtering  and  Large  Deviations  with  Application  to  ob¬ 
server  Design,  Ph.D.  Dissertation,  University  of  Maryland  (SRC  Technical  Report  Ph.D.  88-1) 
(1988). 

[3]  M.R.  JAMES,  Finite  time  observer  design  by  probabilistic-variationad  methods,  to  appear, 
SIAM  Journal  on  Control  and  Optimization  29  (4)  (1991). 

[4]  M.R.  JAMES,  A  numerical  method  for  finite  time  observers,  preprint  (1990). 

[5]  H.J.  KUSHNER,  Probability  Methods  for  Approximations  in  Stochastic  Control  and  for  Elliptic 
Equations,  Academic  Press,  New  York,  1977. 

[6]  P.L.  LIONS,  Neumann  type  boundary  conditions  for  Hamilton-Jacobi  equations,  Duke  Math. 
J.  52  (3)  793-820  (1985). 


19 


Finite  Dimensional 
Approximate  Filters  in  the 
case  of  High  Signal-to-Noise 
Ratio 

E.  Pardoux,  M.C.  Roubaud 
Universite  de  Provence  and  INRIA 

Abstract 

We  present  some  recent  results  on  nonlinear  filtering  problems 
with  high  signal-to-noise  ratio.  We  concentrate  mainly  on  the  scalar 
case,  where  the  observation  function  is  not  one  to  one.  We  describe 
two  situations  where  a  good  suboptimal  filter  is  provided  by  a  bunch 
of  one  dimensional  filters,  together  with  statistical  tests  for  choosing 
which  filter  should  be  fdlowed. 

1  Introduction 

It  is  by  now  well  known  (see  e.g.  Pardoux  (9])  that  the  nonlinear 
filtering  problem  is  a  difficult  one,  whose  optimal  solution  is  in  most 
cases  pven  only  by  the  solution  of  an  infinite  dimensional  equation, 
the  Zakai  equation. 

On  the  other  hand,  up  to  now  all  pratical  filtering  problems  are 
solved  by  approximate  linear  filters,  in  particular  the  well-known 
extended  Kalman  filter  (see  e.g.  [9]).  However,  the  extended  Kalman 
filter  does  not  rely  on  any  mathematical  foundation,  it  is  known  both 
from  theory  (see  [9],  Picard  [14]  and  section  2  below)  and  experience 
that  it  sometimes  behaves  very  poorly,  while  it  gives  very  satisfactory 
results  in  many  situations. 

A  good  framework  for  a  mathematical  analysis  cd  approximate 
filters,  including  the  extended  Kalman  filter,  is  the  situation  <rf  a 
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high  signal-to-noise  ratio,  i.e.  we  consider  the  fcdlowing  non  linear 
filtering  problem  : 

/  dX,  =  /(X,)dt  +  ff(X,)dV, 

\  dYt  =  h(Xi)dt+£dWt 

where  {Xj}  is  the  unobserved  process  to  be  filtered,  {y,}  is  the  ob¬ 
servation  process,  {V(}  and  {IVt}  are  mutually  independent  standard 
Wiener  processes,  all  processes  being  scalar  for  simplicity.  The  goal 
is  to  obtain  asymptotic  results  as  £  — ►  0  (with  e  >  0). 

In  the  case  where  h  is  one  to  one,  this  problem  h<is  first  been 
analysed  by  Bobrovsky,  Zakai  [2]  and  Katzur,  Bobrovsky,  Schuss  [6], 
Jean  Picard  has  then  given  a  very  complete  mathematical  analysis 
of  this  problem,  see  [10],  [11],  [12],  [13],  and  Bensoussan  [1]  has 
given  another  proof  of  most  of  Picard’s  results.  Those  results  can  be 
very  roughly  summarized  as  follows  :  for  small  £  >  0,  the  variance 
of  the  conditional  law  is  of  order  £,  the  optimal  and  suboptimal 
filters  have  a  short  memory  (old  observations  are  quickly  “forgotten” 
by  the  filter),  and  there  exist  various  finite  dimensional  suboptimal 
filters  whose  output  is  close  to  the  conditional  expectation  of  Xt 
^ven  {y«,0  <  s  <  t}  (including  the  extended  Kalman  filter  and  a 
one  dimensional  filter  whose  “error”  is  of  the  order  of  £).  Let  us 
note  that  analogous  results  have  been  established  for  discrete-time 
problems  by  Milheiro  [7]. 

A  second  class  of  problems  concerns  the  case  where  h  is  not  one 
to  one.  Two  cases  of  such  an  h  are  as  follows.  Case  A  is  where  h 
is  locally  one  to  one  ;  in  the  situation  dimX  =  dimT  =  1  which 
we  shall  consider  below,  this  means  that  h  is  piecewise  monotone. 
Case  B  is  where  dim  A  >  dimT  (say  dim  AT  =  2,dimy  =  1);  and  h 
is  a  function  of  say  X}  only,  h  being  either  monotone  or  piecewise 
monotone. 

The  aim  of  this  paper  is  to  present  some  recent  results  for  the 
two  above  problems.  We  shall  mainly  be  concerned  with  case  A,  and 
give  some  hints  concerning  case  B. 

The  organisation  of  the  paper  is  as  fdlows.  In  section  2,  we  shall 
present  case  A,  discuss  the  problem  in  case  e  =  0  (no  observation 
noise),  introduce  two  “detectability  assumptions”  and  compare  them. 
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In  section  3,  we  shall  treat  in  some  detail  case  A  under  one  of  the 
two  detectability  assumptions.  In  section  4,  we  shall  comment  on 
some  recent  results  concerning  case  B. 

Let  us  insist  upon  the  fact  that  we  shall  not  try  in  this  paper 
to  formulate  the  most  general  known  results,  but  rather  to  present 
some  of  the  main  ideas  on  simple  example.  More  general  results  can 
be  found  in  the  references  which  we  shall  give  below. 

2  Case  A  :  Piecewise  Monotone  Observation 
Function. 

In  this  section,  we  want  to  formulate  the  nonlinear  filtering  problem 
with  small  observation  noise 

/  dX,  =  /(X,)dt  +  ff(X,)dV, 

\  dy,  =  h(Xt)dt  +  edWt 

where  all  processes  are  scalar  and  h  is  piecewise  monotone. 

To  be  more  specific,  let  us  assume  that  g  =  1  ;f,h  ^  (M)  with 

bounded  derivatives  ;  h  has  a  unique  minimum  at  z  =  0,  such  that 

h(0)  =  h'(0)  =  0 

and  h'(x)  <  0  for  *  <  0,h'{x)  >  0  for  i  >  0. 

Remark  2.1  ;  Inefficiency  of  the  extended  Kalman  filter.  The  ex¬ 
tended  Kalman  filter  for  the  above  situation  is  : 

f  dX,  =  f(Xt)dt  +  e-^Rth'{X,){dY,-h(X,)dt) 

\  dRt/dt  =  2f\Xt)Rt  +  1  - 

Note  that  (except  possibly  near  t  =  0)  Rt  of  the  order  of 
e,  hence  E~^Rih'(Xt)  is  of  the  order  of  e”'.  Replacing  dyi  by  its 
expression  in  the  first  equation  above  yields  : 

dX,  =  [f{Xt)  +  €-^R,h'{X,)(h(Xt)  -  h(Xt))]dt  +  e-^R,h'{X,)dW, 

We  note  that,  thanks  to  the  leading  term  in  the  drift,  the  ex¬ 
tended  Kalman  filter  is  such  that  h{Xt)  tends  to  follow  closely  h{Xt). 
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However  this  effect  of  the  drift  is  counterbalanced  by  tbe  ncnse  term, 
which  is  also  the  order  of  But  the  main  point  is  that  the  ex¬ 
tended  Kalman  filter  has  no  tendency  whatsoever  to  correct  a  wrong 
sign  :  if  A’t  and  Xt  have  the  same  sign,  the  drift  tends  to  get  then 
closer,  and  if  Xt  and  Xt  have  opposite  signs,  Xt  and  At  tend  to  stay 
for  away  one  from  the  other  (while  the  drift  tends  to  get  h{Xt)  and 
h{Xt)  closer).  In  fact,  if  /(O)  =  0,  then  Xt  never  changes  sign,  since 
h'(0)  =  0,  while  X,  changes  sign  after  arbitrarily  large  times  with 
positive  probability-O 


Our  aim  is  to  present  an  efficient  finite  dimensional  filter  for  the 
above  problem  in  two  particular  cases.  In  order  to  simplify  the  sequel, 
we  shall  from  now  on  assume  that  h  is  piecewise  linear,  i.e.  : 


h(x) 


h+x 

h-x 


,  I  >  0 
,  *  <  0 


with  A+  >  0,/i_  <  0.  Of  course,  h  is  no  longer  C’. 

We  want  to  consider  cases  where  the  variance  of  the  conditional 
law  of  A(,  given  {K,;  0  <  s  <  t}  is  small  (at  least  “most"  of  the  time). 
In  order  to  see  what  kind  of  condition  is  needed,  let  us  now  consider 
the  (simpler)  case  where  e  =  0  : 


/  dXt  =  /(A,)dt-t-dV, 
\  dYtIdt  =  h{Xt) 


Since  h{Xt)  is  completely  observed,  it  suffices,  in  order  to  recover 
Xt,  to  recover  its  sign.  We  first  note  that  we  know  exactly  when 
Xt  reaches  0,  and  when  it  does  not  change  sign.  Hence  tbe  problem 
is  :  given  a  time  interval  [a,  6]  such  that  A(  /  0,  1  €  [a,  6],  can  one 
recover  the  sign  of  Xt,  from  the  observation  of  h{Xt),  t  G  [o,i>]  ? 

This  is  clearly  impossible  in  the  case  /  =  0  and  /»_  =  -h+, 
since  there  is  no  way  to  recover  the  sign  of  a  Wiener  process  from  its 
absdute  value.  Therefore  we  need  to  introduce  what  we  call  a  “de¬ 
tectability  assumption”.  There  are  two  such  possible  assumptions. 
The  first  one  is  : 


(DAI)  |h+|5f|h-| 
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In  this  case,  we  have  that  ; 

^  <  h(X-)  >,=  h%,  t  €  [a, 6)  if  X,  >  0,  t  e  [0,6] 
and 

4  <  h{X.)  >,=  hi,  t  e  [0,6]  if  X.  <  0,  t  e  [a,6] 

at 

In  other  words,  under  (DAI),  the  quadratic  variation  of  the  process 
{h(A'()}  tells  us  instantaneously  the  sign  of  Xt- 

If  {DAI)  does  not  hold  (say  6+  =  — h_  =  1,  i.e.  h{x)  =  ji|),  we 
can  still  do  something,  provided  the  drift  helps  us.  We  now  formulate 
the  second  detectability  assumption,  assuming  for  simplicity  that 
h(i)=  |z|. 


(DA2)  f{x)  +  f{-x)^Q,xeIR 
Let  Zi  =  h(Xt)-  If  Xi  >  0,  t  6  [o,6],  then 

dZt  =  f{Zi)dt  +  dV,,  t  €  [0,6], 

If  A'l  <  0,  t  €  [o,6],  then 


dZt  =  -f{-Z,)dt-dVut€[a,b] 

i.e.  we  observe  a  Wiener  process  plus  a  drift,  which  differs  depending 
on  the  sign  of  Xt,  thanks  to  {DAY).  The  log-likelihood  ratio  is  given 
to  us  in  this  case  by  the  Girsanov  theorem  :  for  o  <  t  <  6, 

L{a,t)  =  J\f{Z.)  +  f{-Z,)]dZ.  -  f{-Z,)]ds 

Note  that  if  A'(  >  0, 1  €  [a,  6], 

L{a,t)  =  i  j‘[nZ.)  +  f(-Z.)fds  +  +  n-Z.)]dV. 

and  if  Xt  <  0,  t  €  [a,  6], 

L{a,i)  =  j\f(Z.)  +  f{-Z.)]^ds  -  J\f{Z.)  +  f{-Z.)]dV. 
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Hence  L(a,t)  is  likely  to  be  positive  in  case  X(  >  0,  a  <  t  <  6 
and  to  be  negative  in  case  Xt  <  0.  The  quality  of  the  test  (i.e.  the 
probability  of  making  the  right  decision)  depends  on  the  value  of 

V,= 

Ja 

which  of  course  depends  on  t  -  a.  The  larger  Ut  is,  the  smaller  the 
probability  of  making  a  wrong  decision  is,  since  from  the  strong  law 
of  large  numbers  ; 

1 

.7—  as  1  —  00,  a.s.  on  {Uao  =  +00} 

Ut  i 

if  >  0,  1  >  a,  and  the  limit  is  —5  if  Jf,  <  0,  <  >  a.  However,  in 
most  cases  Xt  changes  sign  after  some  time,  and  we  don't  want  to 
wait  too  long  before  making  a  decision. 

Clearly,  the  situation  is  very  different  under  (£>^41)  and  under 
(DA2).  Under  (BAl),  the  sign  of  Xi  is  detected  instantaneously, 
while  under  (BA2)  some  time  is  needed  for  the  probability  of  a  wrong 
decision  to  be  small. 

Let  us  now  describe  the  stragegy  for  a  finite  dimensional  subop- 
timal  filter  in  the  case  s  >  0.  We  expect  that  most  of  the  time  the 
conditional  law  of  Xi,  given  {y,,0  <  a  <  f}  will  be  almost  completely 
concentrated  on  one  side  of  0.  Hence  a  good  estimate  of  Xt  should  be 
given  by  an  approximate  filter  for  problem  (2.1)  with  h(x)  replaced 
by  h+i  (resp.  /i_x)  if  X(  >  0  (resp.  <  0).  Therefore  we  consider  the 
two  auxiliary  filtering  problems  : 


(2.4)  1 

\  dX,  =  f(X,)dt+g{Xt)dVt 
[  dYt  =  h+X,dt  +  £dW, 

and 

(2.1.)  1 

\  dXt  =  f{Xt)dt+g{Xt)dV, 

[  dYt  =  h.Xtdt  +  edW, 

to  which  we  associate,  following  Picard  [13]  the  two  filters 

(2.2+)  d-V+  =  /(X*)di+e-\dy,  -  h+X*di) 
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(2.2_)  dX^  =  f(X;)dt-r^{dYt-h.X^di) 

The  filtering  procedure  which  we  propose  consists  in  following 
alternatively  the  filter  (2.2^)  and  the  filter  (2.2_).  Note  that  since 
these  two  filters  are  given  by  stiff  equations,  the  way  they  are  initial¬ 
ized  at  the  time  where  we  start  to  follow  them  is  irrelevant.  In  order 
to  choose  which  filter  to  follow,  we  need  : 

a)  to  isolate  time-intervals  on  which  {Jf(}  is  very  unlikely  to 
change  sign. 

b)  to  decide  which  is  the  sign  of  {X(}  on  a  time  interval  on  which 
we  believe  that  this  sign  is  fixed. 

We  then  follow  the  corresponding  filter  until  a  possible  zero  cross¬ 
ing  by  Xt  is  detected. 

When  Xt  is  close  to  zero  and/or  we  cannot  decide  its  sign,  we 
estimate  it  by  0. 

This  program  has  been  rigorously  developped  under  {DAI)  by 
Fleming,  Ji,  Pardoux  [3]  and  Roubaud  [15]  in  the  piecewise  linear 
case  ([15]  allows  noise  correlation  and  a  piecewise  constant  diffusion 
coefficient)  and  by  Fleming,  Pardoux  [5]  in  the  nonlinear  case  with 
a  piecewise  monotone  observation  function.  Numerical  experiments 
are  reported  in  Fleming,  Ji,  Salame,  Zhang  [4]. 

The  same  program  has  been  developped  under  {Z?y42)  by  Roubaud 
[15]  in  the  piecewise  linear  case,  and  numerical  experiments  are  re¬ 
ported  in  Milheiro,  Roubaud  [8].  In  the  next  section,  we  shall  present 
some  of  the  ideas  in  Roubaud  [15],  on  the  above  example. 

3  Case  A  :  Piecewise  Monotone  Observa¬ 
tion  Function  under  the  Detectability  As¬ 
sumption  (DA2) 

We  consider  again  the  filtering  problem  (2.1)  under  the  condition 
{DA2)  : 

{  dXt  =  f{Xt)dt  +  dV, 

^  >  1  dYt  =  lX,ldt  +  edVY, 
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with  the  assumption 

(3.2)  3ifc  s.t.  l/(i)|  <  it(l  +  |i|),  xem, 

and  the  initial  condition  Xo  =  zq.  We  associate  to  (3.1)  the  two 
“filters”  (see  section  2)  : 

(3.3+ )  dX+  =  /(X,+  )  dt  +  £->  {dY,  -  X+dt) 

(3.3- )  dXr  =  f(Xr  )  dt  ~  £->  (dY,  +  Xr  dt) 
with  the  initial  condition  X^  —  Xo,X^  =  -x^. 

3.1  Detection  of  the  zero  crossing  by  {X,} 

We  first  need  to  detect  when  X,  might  cross  zero.  For  that  sake,  we 
shall  make  use  of  the  : 

Lemma  3.1  For  any  0  <  a  <  b,  c  >  0,  0  <  a  <  1/2  and  0  <  0  < 
1  -  2a,  there  exist  fc  >  0,  £o  >  0  s.t.  for  any  0  <  £  <  £o, 

P(sup  llXtl  -  .y,'*'!  >  C£“)  <  exp(-k/s:^) 

[o,k] 

Proof  :  It  follows  readily  from  (3.1),  (3.2+)  and  the  Ito-Tanaka 
formula  ({!<,<  >  0}  denotes  the  local  time  at  0  of  {y*})  • 

d(|y,|-y+)  =  £->(|y.l-y+)d« 

+(6ign(y,)/(y,)  -  f{X^))dt  +  si^niX,)dV,  -  dW,  +  2dL, 
\X,\-Xt  =  e-'AdXol-Xo^) 

+  /'e-''-*>/'(sign(y.)/(y.)-  /(y+))ds 
Jq 

+  /'^e-<'-)/'sign(y,)dV,-  dW. 

+2  /  e-^'-^l^dL. 

Jo 

It  suffices  to  establish  the  result  for  each  of  the  terms  in  the  above 
right  side.  The  four  first  terms  are  treated  as  in  [5,  Lemma  3.1]  with 
the  help  of  Lemma  3.2  below.  The  last  term  is  analysed  as  in  [15, 
Proposition  1.3.1). 0 
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Lemma  3.2  For  any  0  <  a  <  b,  there  exists  c  >  0  s.t. 

E  ^  sup  expIcA"*]  >  <  00 

J 

sup  E  <  sup  exp[c(A’,‘'’  )^]  I  <  oc 

‘>0  l,a<<<6  J 

Proof  :  The  first  result  is  well-known  (see  e.g.  [15,  Lemma  1.2.7]). 
The  second  one  can  be  established  as  follows  : 

dX+  =  f(X+)dt  +  £->(|A',|  -  X+)dt  +  dW, 

For  X  6  iR  and  {Zt]  a  bounded  variation  process,  let  Ut{x,z) 
denote  the  scdution  of  : 

f/,(i,2)  =  i+  f‘  f(U.(x,z))ds  +  W,  +  Z, 

JO 

Let  =  Ui{xo,Z’^),  where  {Z^}  is  the  smallest  increasing  process 
s.t. 

£/,(io,Z")>  |X,|,  t>0, 

and  up  =  Ut(xo,Z”'),  where  {Zp)  is  the  largest  decreasing  process 
s.t. 

Ui(xo,Z”')<\XtU  t>0. 

It  follows  from  a  comparison  theorems  for  one  dimensional  SDEs  that 
U,(xQ,Zr)  <  X+  <  U,{xo,Z'^) 
and 


E{  sup  exp[cf/i*(io,Z"')]+  su;,  exp[ct/'*(io,Z^)])  <  oo 
<«(“.*]  !€[<».»] 

follows  from  the  same  result  for  {|Jf(|}.  Note  that  another  proof  of 
the  second  part  of  this  Lemma,  which  carries  over  to  higher  dimen¬ 
sion,  is  given  in  [15,  Lemma  I.2.8].0 
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With  the  help  of  Lemma  3.1,  we  can  easily  huild  a  procedure 
which  detects  the  possible  zero  crossings  by 

We  choose  e  >  0  and  0  <  o  <  1/2.  Whenever  <  c£“,  we 
conclude  that  Xt  might  be  zero,  and  we  choose  0  as  the  estimate 
of  Xt.  Whenever  X^  >  c£“,  we  decide  that  ^  0  and  we  try  to 
estimate  its  sign.  For  any  0  <  a  <  5,  we  define 

C(a,6)  =  {X+  >c£“,a<l<5} 

The  next  and  last  step  conasts  in  a  test  for  deciding,  conditionned 
upon  the  observation  to  belong  to  C(a,b),  upon  the  sign  of  Xt,t  £ 
[a,b]. 

There  are  two  possible  tests  for  this  problem.  The  first  one  is 
an  extrapolation  of  the  test  used  in  the  case  f  =  0,  and  the  second 
one  is  a  likelihood  ratio  test  based  on  the  outputs  of  the  two  filters 
(3.3+)  and  (3.3.). 

Before  presenting  those  two  tests,  let  us  formulate  a  stronger 
version  of  (DA2),  which  will  be  supposed  to  hold  throughout  the 
rest  of  this  section  : 

(DA2s)  3  c,d>0  s.t.  inf  (/(i)  + /(y)]  >  d 

|x-V|<c 

3.2  Deciding  the  Sign  of  Xi  ;  a  Test  based  on  the  In¬ 
crements  of  {y,}. 

Define  ^ 

l\fiy)  +  f{-y)]dy 

Jo 

L{a,b),  which  was  defined  in  section  2,  can  be  rewritten  as  : 

L(a,b)  =  FyZ,)-  +  l'{Zt)]dt 

+  \j\f\-Zt)-fHZ.)]dt 

Of  course,  in  the  case  e  >  0,  Zt  =  h(Xt)  is  not  observed,  and 
L(a,b)  is  no  longer  a  statistics.  We  note  that  ; 
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1  I 

£-‘(y,+. -y,)  =  £-»y  z.ds  +  w,+,-w, 

Let  m  =  [£"*(6— a)]  ((■]  denotes  the  integer  part  of  its  argument), 
=  a  +  A;£,  A:  =  0, 1, . . . ,  m, 

2*  =  £“’(^<11+1  “  <;  =  0,l,...,m  -  1 


We  can  then  define  the  statistics  : 

=  F(Z„-t)  - -r(Zo) 

+  e/2’^[/'(-Z,)  -  f'(Z,)  +  f(-Z,)  -  f(Z,)] 

*=o 

Assuming  in  addition  to  the  above  hypotheses  that  /'  is  Lipschitz, 
it  is  not  hard  to  show  that  for  small  £  >  0  £*(a,6)  is  close  to  L{a,b). 
Hence,  if  L‘(a,b)  >  0  (resp.  <  0),  and  provided  the  observation 
belongs  to  C(a,b)  and  S^{f(Zt)  +  f{-Zt)Ydt  is  large  enough,  there 
is  a  high  probability  that  >  0  (resp.  <  0),  t  €  [a, 6], 

Again,  the  details  can  be  found  in  [15].  We  note  that  the  exten¬ 
sion  of  this  method  to  higher  dimension  requises  that  /  be  a  gradient. 

3.3  A  Likelihood  Ratio  Test  based  on  the  Outputs  of 
the  two  Approximate  Linear  Filters. 

We  consider  again  the  filtering  problem  (3.1),  to  which  we  zissociate 
the  two  approximate  linear  filters  (3.3+)  and  (3.3_).  Note  that,  if 

>’<  =  <T{y,:0<s<<}, 


dYt  =  E(\Xi\fy,)dt  +  edir, 

where  {ft],  the  innovation  process,  is  a  standard  Wiener  process  (see 
e.g.[9]).  We  expect  that  if  >  0  (resp.  <  0),  t  €  (o,6],  then  X, 
(resp.  X,~)  is  very  close  to  FdAd/Ti),  at  least  after  some  time. 
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Therefore  we  introduce  the  following  quantity,  which  we  interpret 
as  being  an  approximate  log-likelihood  ratio.  Let  a  <  e  <  b.  Define 

i'  =  J\x* + xr)dY,  -  £-V2  j[*(ix+p  -  ijerhdt 

We  shall  now  show  that,  provided  e  —  a  is  not  too  small  and  b  —  e 
is  large  enough,  the  conditional  probability 

P(t  >QIA+{a,b)), 


where  ^4+  (a,  6)  =  {X,  >  0,  a  <  i  <  6} ,  is  very  close  to  one.  A  similar 
result  holds  for  P(L‘  <  0/A_(a,6)).  let 

ras/t 

M+  =  exp[£-‘  /  (X.  -  IJV.DdW,  -  £-V2  /  (X.  -  |Ar,|)^ds] 

Ja  Ja 

and  P*  be  a  new  probability  measure  given  by  : 


F>om  Girsanov’s  theorem. 


dY,  =  Xtdt  +  £dW+,  0<t<b 

where  {W7'',0  <  t  <  6}  is  a  standatrd  Wiener  process  under  P"*". 
Hence,  again  from  well-known  results  from  nonlinear  filtering,  if 


JV+  =  E+{Xt/y,),  0  <  t  <  6 


then 


dYi  =  X^dt  +  edu+,0  <t<b 


We  have 

t  =  J\x+ +  Xr)di^t +€-^/2j\x* +Xr)^dt 
+  J\x*  +  XniXt  -  Xt)dt 


Finite  Dimensional  Approximate  Filters 
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We  shall  show  that  on  A+(a,4)  the  third  term  above  is  negligible, 
compared  to  the  second  term.  Hence  the  agn  of  L'  is  given  by  that 
of 

+  e-^/2j\xt  +  xr  fdt 

=  W(,  +  1/2  <  IV  >6 

where  {iVi}  is  a  P+  martingale.  Hence  >  Qj  <  N  >j>  r) 

is  close  to  one  when  r  is  large.  We  now  establish  : 

Lemina  3.3  For  any  Q  <  0  <  1,  r  <  {b  —  e)tP,  there  exists  k,  £o>  0 
s.t.  for  any  £  6  [O.fo], 

P{£-‘^  j\xt  +  Xffdt  <r)<  exp(-fc/£^) 

Proof  :  From  lenuna  3.1  (and  its  analogue  with  A','*'  replaced  by 
-A,~),  for  any  0  <  1,  there  exists  k,£o  s.t.  : 

P({sup  ||A,|  -  A+1  >  £}  u  {sup  ||A,1  +  ATI  >  |})  <  exp(-kl£^) 

[a, 6]  ^  [a,6)  r 

However, 

|(A+  +  AD  =  -£-HXt  +  xn  +  /(A+)  +  /(AD 
hence 

A,  +  AT  =  e-(<-)A(A;  +  AT)  +  f  c-('-*)/-[/(AT)  +  /(A/  )]ds 

Ja 

The  first  term  in  the  above  right  side  is  very  small  for  t  >  e.  The 
second  term  is  bounded  below  by  ; 

s(l  ^  hjf  [/(A+)  +  /(AD] 

O  VI VO 

Provided  |A+  +  Af  |  <  c,  we  deduce  from  (DA  2s)  that  : 

/(A+)  +  /(Ar)><^ 
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The  result  now  follows.  0 
The  fact  that  the  term 

€-^J\x+  +  xr)ix*-x+)dt 

is  negligible  on  A'*'(^a,b)  follows  from  Lemma  3.3,  the  obvious  remark 
that  P"*"  and  P  coincide  on  A'^(a,b)  (since  =  1  on  this  set)  and 
Theorem  3  from  Picard  [13]  which  states  that  for  any  p  >  1, 

(£+[|Jf+  -  XtV‘])'‘'‘  =  0(£3/^) 

We  can  easily  conclude  from  the  above  : 

Proposition  3.1  There  exists  a  continuous  decreasing  function  p  : 
iR+  — *  iR+i  with  limi_^.oo  p(i)  =  0,  and  for  any  p  >  1,  there  exist 
k,  £o  >  0  s.t.  : 

P{{L‘  <  0}  n  4+(a,6)  nC(a,5))  <  ks’>  +  p(6  -  a) 
for  any  £  €  {0,£o]-  0 

We  note  that  the  difference  with  the  results  under  (DA  1)  is  the 
appearance  of  the  term  p{b-a) :  for  a  fixed  interval  [a,b],  under  (DA 
2)  the  probability  of  making  a  wrong  decision  does  not  tend  to  zero 
as  e  I  0.  This  is  consistent  with  the  results  is  the  e  =  0  case. 

4  Remarks  on  the  problems  with  dimX  > 
dimy 

Suppose  that  dim  X  =  2  and  dimT  =  1,  and  that 
dYt  =  h{X})dt  +  £dW, 

Assume  first  that  h  is  monotone.  Then  one  can  show  (see  Yaesh, 
Bobrovsky,  Schuss  [16],  Picard  [14])  that  there  exist  efficient  lin¬ 
earized  filters,  provided  that  the  variance  of  the  conditional  law  of 
Xt,  given  {y,,0  <  a  <  f}  is  small.  But  now  this  need  not  be  the  case 
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in  general.  It  is  the  case  for  the  following  model,  which  has  been 
rigorously  analysed  by  Milheiro  [7], 

f  dx}  =A(x},x^)dt 
{  dX?  =MX},Xf)dt  +  g(X},X})dVt 
[  dy,  =  h{X})dt  +  cdWt 

where  /  is  C^,  for  each  ij.xj  — ►  /i(ii,i2)  is  one  to  one  and  its  in¬ 
verse  is  Lipschitz,  and  some  further  regularity  assumptions  are  satis¬ 
fied.  Milheiro  [7]  gives  a  two  dimensional  filter  with  output  Xt  which 
is  such  that,  for  t  >  <o  >  0, 

E(\X}  -  X/p)  =  0(e3/^),  Ei\X^  -  X?p)  = 

It  is  also  possible  to  derive  approximate  finite  dimensional  filters, 
even  when  the  covariance  of  the  conditional  law  is  not  small  with 
£,  provided  the  problem  has  a  special  structure.  Let  us  describe  a 
problem  which  has  been  successfuly  treated  in  Roubaud  [15].  Again, 
dim  Jf  =  2  and  dimV  =  1.  We  assume  that  ; 

/  dX,  =(fiX})  +  bXf)dt  +  GdV, 

\dYt  =h{X})dt  +  sdWt 

where  /  :  JR  -*  and  h  :  K  M  are  piecewise-linear  (with 

say  two  pieces),  h  being  non  monotone,  6  €  2R*,  G  is  a  2  x  2  matrix, 
and  {Vt}  is  two-dimensional  standard  Wiener  process.  As  in  the 
preceding  section,  we  associate  to  this  problem  two  (linear)  filters, 
and  test  procedures  to  decide  which  filter  to  follow.  The  conditional 
variance  in  the  i'  direction  is  small,  but  it  is  in  general  of  order 
one  in  the  direction.  The  fact  that  the  system  is  linear  in  Xf  is 
crucial  here.  Note  that  one  major  difference  with  the  situation  of  the 
preceding  section  is  that  the  filter  (or  at  least  its  second  component) 
does  not  have  a  short  memory. 
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Abstract 

The  purpose  of  this  paper  is  to  study  some  statis¬ 
tical  problems  :  parameter  estimation,  binary  detec¬ 
tion,  change  detection  (disorder  problem),  etc.  for 
partially  observed  diffusion  processes,  using  the  like¬ 
lihood  approach. 

It  is  shown  that  the  stochastic  PDE  related  to  the 
state  estimation  problem,  provides  also  a  way  to  com¬ 
pute  the  likelihood  function/ratio. 

A  recent  result  on  consistency  of  the  MLE,  in  the 
small  noise  asymptotics,  is  also  presented. 

1  Introduction 

Consider  the  following  partially  observed  stochastic 
differential  system,  defined  on  some  probability  space 

dX,  =  b{X,)dt  +  a(X,)dW, 

(1) 

dY,  =  h{X,)dt  +  dV, 

where  the  non  observed  process  {X| ,  t  >  0}  and 
the  observation  {Vi ,  t  >  0)  takes  values  in  R""  and 
R"*  respectively.  (IV, ,  f  >  0}  and  {K  ,f  >  0)  are 
independent  Wiener  processes  of  appropriate  dimen¬ 
sion,  with  covariance  matrix  7,  and  the  random  vari¬ 
able  Xo  is  independent  of  the  Wiener  processes,  with 
probability  distribution  po(z)dz.  The  available  in¬ 
formation  at  time  t  is  contained  in  the  <r-algebra 
J'r  =  ff(F, ,  0  <  s  <  f). 

The  first  problem  one  is  faced  with,  is  state  esti¬ 
mation  ;  to  estimate  recursively  the  state  X,  given 
observations  y,  up  to  time  (.  The  solution  to  this 
first  problem  is  given  by  the  Zakai  equation,  a  sto¬ 
chastic  partial  differential  equation  which  computes 
recursively  the  conditional  density  of  X,  given  3^,. 
This  assumes  that  the  partially  observed  dynamical 
system  (1)  is  completely  identified,  which  usually  is 
not  the  ease. 
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Therefore,  a  second  problem  is  to  assume  that  the 
model  (1)  is  parametrized  by  some  unknown  param¬ 
eter  0  in  0  C  R’’,  and  to  estimate  0  on  the  basis  of 
observations  y,.  Several  statistical  problems  are  in¬ 
troduced  in  Section  2  for  the  parametrized  model  ( 1) . 
Off-line  statistical  procedures  based  on  likelihood  are 
presented  in  Section  3.  It  is  shown  in  Section  4  that 
the  Zakai  equation  provides  also  a  way  to  compute 
these  likelihood  statistics. 

Another  issue  is  to  prove  that  the  statistical  algo¬ 
rithms  based  on  the  likelihood  approach,  actually  pro¬ 
vide  good  estimates,  in  some  asymptotic  sense.  A  re¬ 
cent  result  in  this  direction  has  been  obtained  for  the 
consistency  of  the  MLE,  in  the  small  noise  asymp¬ 
totics,  see  J  ames-LeGland  [4]. 

2  Statistical  problems 

Let  ©  C  R^  denote  the  parameter  space.  Assume 
that  observations  {Yt ,  0  <  f  <  T)  are  available  from 
the  following  model 

dX,  =  b,{X,)di  +  <T(,X,)dW! 

dY,  =  h,{X,)dt^dV> 

The  statistical  problems  to  be  considered  in  this 
paper  are 

(a)  parameter  estimation  :  estimate  0  €  0. 

(b)  binary  detection  :  decide  between  the  two  simple 
hypotheses 

Ho  ’  ^  —  ^0  t 

Hi  ;  0  =  01  . 

Another  related  problem  is  the  sequential  binary 
detection  problem. 

(c)  change  detection  (disorder  problem)  :  decide  be- 


tween  the  two  compoeite  hypotheses 


Introduce 


Ho  :  $  —  0Q  , 

Hi  :  there  exists  0  <  r  <  T,  such  that 

9  —  8q  on  0<t<r, 

0=61  on  r<t<T  . 

In  case  Hj  has  been  decided,  another  problem  of 
interest  is  t/>  estimate  the  change  time  r. 

A  variant  of  this  problem,  is  when  only  0q  is 
known  :  the  alternate  hypothesis  Hj  is  compos¬ 
ite  with  respect  to  both  r  and  0\.  In  case  Hi  has 
been  decided,  both  r  and  0i  are  to  be  estimated. 


Z;[e]kexpy\i(X.)dY,  -I  J’\h,{Xr)\^<ir'^ 

and  Zt[0]  =  Zf[0]  .  Provided  the  probability  mear 
sures  on  with  densities  {pj  >  ^  €  are  mutually 
absolutely  continuous,  the  statistical  model  defined 
above  is  dominated  by  some  probability  measure  P^. 
Indeed,  it  is  proved  in  [2]  that 

Proposition  3.1  The  probability  measures  in  Af  are 
mutually  absolutely  contintteud.  In  addition 

g  =Ej(Zr[e]|>'r), 

where  P,  is  the  reference  probability  measure. 


(d)  Bayesian  change  detection  (jump  Markov  param¬ 
eter)  :  recursively  estimate  fl,  given  assuming 
that  {$t ,  <  >  0}  is  a  finite  state  Markov  pro¬ 
cess,  independent  of  the  Wiener  processes,  with 
jump  intensity  matrix  Q.  This  problem  is  closely 
related  to  state  estimation,  see  Loparo-Roth- 
Eckert  [7], 

For  each  of  the  problems  listed  above,  the  first  step 
is  to  provide  an  expression,  in  terms  of  conditional 
expectation,  for  the  likelihood  function  (LF),  the  like¬ 
lihood  ratio  (LR),  or  the  generalized  likelihood  ratio 
(GLR),  depending  on  the  problem. 

3  Likelihood  based  off-line 
statistics 

Statistical  model  On  the  canonical  space  Q  = 
C([0,  T] ;  are  given 

■  a  pair  of  stochastic  processes  {A', ,  0  <  f  <  T) 
and  {y, ,  0  <  t  <  T)  taking  values  in  R"*  and 
R**  respectively, 


Parameter  estimation 

The  likelihood  function  for  the  estimation  of  $, 
based  on  observations  in  ^7- ,  is  given  by 

=EJ(^7-[e]|3'T).  (3) 

at-  y.j. 

and  the  maximum  likelihood  estimate  (MLE)  is  de¬ 
fined  as 

9  €  argmax  L[9]  . 
tee 

To  find  9,  one  can  use  an  iterative  optimization  al¬ 
gorithm  for  the  maximization  of  the  likelihood  func¬ 
tion  L\9].  An  alternative  approach  is  to  use  the  EM  al¬ 
gorithm,  as  proposed  by  Dembo-Zeitouni  [3].  This 
algorithm  is  based  on  the  following  immediate  con¬ 
sequence  of  the  Jensen  inequality,  where  /[#]  denotes 
the  log-likelihood  function 

m-m  =  iogEj,(^i>r) 

>  t',.{\oi^^^\yr)  =  Q\9.9']. 


■  a  family  Ad  =  {P*  ,  9  €  0}  of  probability  mea¬ 
sures, 

such  that  under  Pi 

dXt  =  6,(A,)d<-|-<r(Ar,)dW/ 

(2) 

dY,  =  h,(X,)dt  +  dV* 

where  {Wf  ,  0  <  <  <  T)  and  {V,'  ,  0  <  t  <  T)  are 
independent  Wiener  processes  of  appropriate  dimen¬ 
sion,  with  covariance  matrix  I,  and  the  random  vari¬ 
able  Ao  is  independent  of  the  Wiener  processes,  with 
probability  distribution  Po(')  Throughout  the  pa¬ 
per,  the  coefficients  are  assumed  to  be  continuous  and 
bounded  functions  on  R™. 

The  main  assumption  is  that  all  the  available  infor¬ 
mation  is  contained  in  Vr  =  <r{Yt ,  0  <  f  <  T). 


The  idea  of  the  EM  algorithm  is  to  replace  the  direct 
maximization  of  L[9],  by  the  iterative  maximization 
of  the  auxiliary  function,  i.e 

9"+‘  e  argmax  Q[e,  9"]  . 

»e9 

Under  mild  hypotheses,  the  sequence  {9" ,  n  >  0) 
converges  to  a  stationary  point  of  the  original  likeli¬ 
hood  function  L[9].  See  Campillo-LeGland  [2]  for  a 
comparison  between  the  two  ^proscbes. 


Binary  detection 

The  likelihood  ratio  for  deciding  between  hypothe¬ 
ses  Ho  and  H],  based  on  observations  in  ^7,  is  given 

dPo  i[9o]  ’  '  ' 


{ 

) 


where  Pi  =  P#,  for  i  =  0,1.  The  likelihood  ratio 
test  is  defioed  by  the  following  reject  region  for  the 
null  hypothesis  Hq 

>  c  , 

where  c  >  1  is  the  threshold. 


Sequential  binary  detection 

In  this  problem,  the  horizon  is  not  fixed.  Let  Rt 
denote  the  likelihood  ratio  for  deciding  between  hy¬ 
potheses  Ho  and  Hi,  based  on  observations  in  An 
admi5st6/e  dtciaion  policy  tot  the  sequential  binary  de¬ 
tection  problem,  is  defined  by  a  stopping  time  t  and 
a  -measurable  {0,  l}-valued  decision  random  vari¬ 
able  6  :  if  j  =  0  (resp.  ^  =  1)  the  null  hypothesis  Ho 
is  accepted  (resp.  rejected).  In  other  words,  6  defines 
a  reject  region  for  the  null  hypothesis  Ho-  A  ikrtahold 
decision  policy  is  defined  by  a  stopping  time  of  the 
form 

r  =  inf{(  >0  :  R,i  (A,B)} 
and  a  reject  region  for  the  null  hypothesis  Ho  of  the 
form 

{1  ,  if  ft.  >  B  , 

0  ,  if  ft,  <  .4 

where  0<-4<l<B<ooare  the  constant 
thresholds.  This  problem  has  been  studied  by  Baras- 
LaVigna  [1],  following  Liptser-Shiryayev  [6]. 


Note  that,  with  this  definition,  the  probability  asso¬ 
ciated  with  the  null  hypothesis  Ho  is  ftr  - 
The  generalized  likelihood  ratio  for  deciding  be¬ 
tween  hypotheses  Ho  and  H,,  based  on  observations 
in  I  is  given  by 


p  A  dft, 

ft  =  max  -77:- 
0<r<T  dPr 


L\r] 

=  max  ' 

13;^ 


where 


(6) 


. ,  ,  A  dPr 


\yj. 


=  E*(ZT[r]  |3^t)  . 


is  the  likelihood  function  for  the  estimation  of  the 
change  time  r,  based  on  observations  in  yr-  The  gen¬ 
eralized  likelihood  ratio  test  is  defined  by  the  following 
reject  region  for  the  null  hypothesis  Ho 


i?  >  c  , 

where  c  >  1  is  the  threshold. 

Moreover,  in  case  Hj  has  been  decided,  the  maximum 
likelihood  estimate  of  the  change  time  r,  based  on 
observations  in  ^7,  is  given  by 


f  G  argmaxl[r]  . 

0<r<T 

Introducing  the  ^-algebras  .  0  <  s  < 

f)  and  yi  =  —  V,  ,  <  <  t  <  1),  the  following 

decomposition  holds  for  the  likelihood  function  L[r] 


Change  detection  (disorder) 

The  statistical  model  for  this  problem  can  be  de¬ 
scribed  through  the  introduction  of  time  dependent 
coefficients  ;  for  0  <  r  <  T  ,  let  ft,  denote  the  proba¬ 
bility  measure  on  the  canonical  space  fi,  under  which 


dX,  =  brit,X,)di  +  ffiX,)  dw; 

(5) 

dY,  =  hr{t,X,)dt  +  dV' 

where  {tV,'' ,  0  <  t  <  T}  and  {V,'' ,  0  <  t  <  T)  are 
independent  Wiener  processes  of  appropriate  dimen¬ 
sion,  with  covariance  matrix  /,  and 


6,(f,i:)  =  6o(*) +  [*!(*) -l>o(i)]  l{r<t<r)  > 

K(t,z)  i  ho{x)  +  [hi{z)  -  ho(i)l  l{r  <  <  <  T)  ’ 

where  6i(z)  =  6«.(i)  and  h,(r)  =  />«,{x)  for  i  = 

0,1 

Introducing  for  0  <  r  <  T 

Ztlr]=expy\:(T,Xr)dYr  -  \  J' \hAr. Xr)\^ dr^ 


and  Zt[r]  —  Zflr]  ,  it  holds  that  the  probability  mev 
sures  (ft)  ,  0  <  r  <  T)  are  mutually  absolutely  con¬ 
tinuous.  Moreover 


dftt 


=  El{ZT[r]\yT). 


L[r]  =  EKZrlr]  \  ^r)  =  El(2.[0]  ■  Zf  [1]  |  ^t) 

=  E)  (El(Z.[o]  •  Zf  [1]  I  V  y,  V  yf )  1  yr) 

=  E;(z,[0)El(Zf[i]|:r.v3?f)|>v) 

=  e;(z.[o]  E;(zf[i]ir,v3^)i>v) , 

where  Zf[i]  =  Zl[$i]  for  t  =  0, 1.  Defining 
v,'(x)^El(Z)-[l]|>'5.v{X,  =x)), 

it  holds 

L\t]  =  El(Z.[0]W{.Y,)|yr) 

(T) 

=  E’(Z,[0]  .,;(AV)|yr)  . 

The  purpose  of  the  next  section  is  to  provide  some 
computational  procedure,  that  would  allow  to  numer¬ 
ically  compute  the  likelihood  based  statistics  intro¬ 
duced  so  far. 


4  Computational  likelihood 
statistics 


In  this  section,  the  link  between  the  likelihood 
based  statistical  problems  introduced  above,  and  the 


state  estimation  problem,  will  be  investigated.  At  this 
point,  it  is  necessary  to  introduce  some  notations  and 
definitions  related  to  nonlinear  filtering  and  smooth¬ 
ing. 

For  the  sake  of  simplicity,  any  reference  to  the  pa¬ 
rameter  9  will  be  dropped  for  the  time  being. 

□  FiUtring:  Let  pt  denote  the  unnormalized  condi¬ 
tioned  density  of  the  random  variable  Xt  given  X, 
defined  by 

(p„<^)  =  EH<i>(X,)Z.\y,)  (8) 

for  any  test-function  The  unnormalized  condi¬ 
tional  density  {pi ,  f  >  0}  satisfies  the  Zakai  equa¬ 
tion  [8] 

dp,  =  L'p,  di  -t-  fc'p,  dV,  ,  (9) 

where  L  is  the  following  partied  differentied  operator, 
associated  with  the  stochastic  differential  system  (1) 

n»  <%2  ^  <v 

□  Smoothing  (fixed-interval):  Let  T  >  0  denote  the 
fixed  end-time,  and  qt  denote  the  unnormalized  con- 
ditiona!  density  of  the  random  variable  Xt  given 
defined  by 

(q,,<i>)^E\<t‘{x,)ZT\yT)- 

Introducing  the  backward  Zakai  equation 

dti,  +  Lvt  dt  -f  h'vt  dYi  =  0  ,  tT  =  1  ■  (10) 

it  is  proved  in  Pardoux  [8,9]  that  (pi,v,)  is  indepen¬ 
dent  of  i  and  f,  =  p,  '  V,  .  In  addition 

=  =  *}). 

Existence,  uniqueness  and  regularity  results  for  sto¬ 
chastic  PDE  can  be  found  in  Pardoux  [8]. 

Let  now  L0  and  Lr{t)  denote  the  partial  differential 
operators 

^  .fl2  ^  p 

»j  =  l  *  * 

aj  o 

associated  with  the  stochastic  differential  equation  (2) 
and  (5)  respectively. 

Parameter  estimation 


Binary  detection 

A  similar  expression  holds  for  the  likelihood  ra¬ 
tio  (4) 

r,  (PT’  1) 

(pS-,1)  ■ 

Here  the  unnormalized  conditional  density  {pi ,  /  > 
0}  solves  the  Zakai  equation 

dp?  =  L-pl  d/ -H  A*pi  dy,  ,  (11) 

where  L,  =  Lt,  and  h,  =  A«,  for  ■  =  0, 1  . 

Change  detection  (disorder) 

1^1  {p? ,  f  >  0}  (f,  ,  f  ^  0}  denote  the  solution 

of 

dp?  =  i;(i)p?df4-/i;(()p?dy, , 


dt)?  +  ir(<)t)?  dt  +  h'At)v;  dy,  =  0  , 


uf  H  1  , 


respectively,  where  h-(l)  is  shorthand  for  hr(t,  •). 

The  generalized  likelihhod  ratio  (6)  is  given  by 

R=  max  .  (12) 

However,  a  much  more  efficient  expression  can  be  ob¬ 
tained.  Indeed,  for  all  0  <  f  <  T 

i.[r]  =  (p^,l)  =  (p?,v?), 

and  in  particular  for  <  =  r 


L[r]  =  (p?,v?)  =  (p»,t,,') 


dp?  =  L;p?d(-f/.;p?dy. 


Livl  dt  +  ftjv*  dYt  ~  0  y  Vj*  =  1  -  (15) 

Therefore,  it  is  enough  to  solve  two  stochastic  PDE, 
the  forward  equation  (14)  with  parameter  Oq,  and  the 
backward  equation  (15)  with  parameter  0i .  This  gives 
the  following  expression  for  the  generalized  likelihood 
ratio 

ft 

n  =  m20C  *7—7? — -r-  . 

0<r<7-  (p?.,l) 


The  following  expression  holds  for  the  likelihood 
function  (3) 

where  the  unnormalized  conditional  density  (pf ,  f  > 
0}  solves  the  Zakai  equation 

dp'  =  L;pf  di  +  hlp'  dY,  . 


which  is  much  more  efficient  than  the  original  expres¬ 
sion  (12),  which  would  require  to  solve  an  infinite 
number  of  stochastic  PDE  (see  Figure  1). 

Remark  4.1  The  expression  (13)  for  the  likelihood 
ratio  could  also  be  obtained  from  the  previous  expres¬ 
sion  (7)  obtained  by  decomposition. 


Figure  1:  Stochastic  PDE  for  the  disorder  problern. 

It  can  also  be  proved  that  the  likelihood  fuDCtion 
r  L[r]  is  smooth,  provided  the  change  can  only 
occur  in  the  drift  coefficient,  i.e.  ho  =  hi.  Actually, 
using  the  two-^ided  stochastic  calculus  developped  in 
Pardoux  [10] 

d{plv})  =  (Llplvl)dt  +  (hlplv})dY, 

-{plL,v})dt-(plhlvl)dY, 

=  (p°,[La- Li]v})dt  . 

Bayesian  change  detection 

The  unnormalized  conditional  distribution  of  the 
compound  state  {Xt,9,)  given  observations  in  is 
defined  by 

tor  i  =  1,2,  ■  ■  ■  ,N  ,  where  in  this  section  the  process 
{Zt[S] ,  0  <  <  <  T}  is  defined  by 

z;ie]ioxpy\i{Xr)dYr  |V(A-,)|='drJ 

and  Zi[S\  =  Z°{9\  .  In  addition  {p5,0<t<T)  sat¬ 
isfies  the  following  system  of  coupled  Zakai  equations 

N 

dp\  =  L'iP'i  di  +  h'p]  dY,  +  ^  i/,  dt  ,  (16) 

j=i 

where  Q  =  {ji,,}  is  the  jump  intensity  matrix  for 
the  Markov  process  {^i  ,  0  <  f  <  T).  Note  that  in 
system  (16)  the  coupling  occurs  only  through  zero- 
order  state-independent  coefficients. 

The  unnormalized  marginal  conditional  distribu¬ 
tion 

(p;,i)  =  c,.p(fl,  =  .|y,). 

can  be  used  to  compute  the  maximum  a  posteriori 
(MAP)  estimate 

€  argmax  {p\,  1)  . 
l<i<N 

Assuming  that  the  jump  intensity  matrix  is  of  the 
form  c  ■  Q  where  e  >  0  is  a  small  parameter  -  which 


means  that  the  frequency  of  the  jumps  is  small  -  it 
is  possible  to  obtain  an  asymptotic  expansion  of  the 
unnormalized  conditional  distribution  in  powers  of  c. 

5  Asymptotic  statistics 

Some  off-line  statistical  procedure  based  on  likeli¬ 
hood,  have  been  presented.  Whether  these  statistical 
procedures  actually  provide  good  results,  has  to  be 
investigated  in  some  asymptotic  sense.  Two  kind  of 
asymptotics  are  generally  considered  in  the  statistics 
of  random  processes,  see  Kutoyants  [5] 

•  the  small  noise  asymptotics,  where  the  noise  co- 
varizmces  are  of  order  y/e,  and  c  is  sent  to  zero, 

•  the  long-time  asymptotics,  where  the  observa¬ 
tion  horizon  T  is  sent  to  infinity. 

This  section  is  devoted  to  presenting  a  recent  result  on 
the  consistency  of  the  MLE  in  the  small  noise  asymp¬ 
totics,  see  James-LeGland  [4]. 

Statistical  model  On  the  canonical  space  Q  = 
C([0.T];  are  given 

a  pair  of  stochastic  processes  {Xt ,  0  <  t  <  T] 
and  {>'( ,  0  <  f  <  T)  taking  values  in  R”’  and 
R'*  respectively, 

■  for  each  c  >  0,  a  family  At'  =  ,  ^  €  ©}  of 

probability  measures. 

such  that  under  Pu.t 

dXt  —  6^(Ai)d/ +  ' 

(IT) 

dY,  =  h,{X,)dt  +  ./idVf‘ 

where  {W‘‘ ,  0  <  <  <  T)  and  {Vf  ‘ ,  0  <  t  <  T] 
are  independent  Wiener  processes  of  appropriate  di¬ 
mension,  with  covariance  matrix  /,  and  the  random 
variable  Xq  is  independent  of  the  Wiener  processes, 
with  probability  distribution  pJ'‘(x)dj*.  It  is  assumed 
that  the  initial  density  is  of  the  form 

p;  '(i)  =  C,,.  •exp(-iso*(x)}  , 

where  the  function  Sj  has  an  unique  minimizer  Xq. 

Limiting  deterministic  system  For  any  9  €  &, 
consider  the  following  deterministic  differential  sys¬ 
tem 

f  if  =  i#(xf)  ,  xj  =  ij 

(£')  < 

I  !/f  =  fi»(xf)  ,  »o  =  0 

which  is  obtained  from  (17)  by  sending  c  to  zero.  This 
defines  a  family  Ad®  =  {(E') ,  9  6  6}  of  deterministic 
differential  systems. 


Actually,  the  following  convergence  in  probability 
of  the  experiments  holds 

P,,.(  sup 

0<*<T 

Deterministic  parameter  estimation  Assume 
that  a  trajectory  {yf ,  t  >  0}  ia  observed,  which  is 
actually  the  output  of  some  deterministic  differentia! 
system  (S")  in  the  model  for  some  unknown 
a.  The  problem  is  to  estimate  the  parameter  ^  €  6, 
based  on  observations  {yf  ,  t  >  0}. 

The  model  is  said  identiJiabU  on  [0,  T|,  if  for 
all  ^  there  exists  f  6  [0, T]  such  that  #  yf, 
i.e.  different  values  of  the  parameter  give  different 
output  trajectories.  In  other  words,  the  mapping 
^  {yf  «  0  ^  f  ^  *a  injective.  The  determinism 
tic  parameter  estimation  problem  consists  of  invert¬ 
ing  this  mapping.  This  can  be  expressed  in  terms  of 
the  following  variational  problem. 

Introduce  the  cost  functional 

Jo 

Jo  Jo 

for  absolutely  continuous  ^  6  C([0,7^ ;  R*”),  and  the 
following  least-squares  functional 

(aW  =  inf{7^(^,  T)  :  C((0,71 ;  R"*))  . 

The  deterministic  parameter  estimate  (DPE)  is  de¬ 
fined  by 

Ma  =  argmin  ft,  [6]  . 
see 

The  following  consistency  result  can  be  proved, 
which  relies  on  PDE  techniques  for  large  deviations 
{vanishing  viscosity  theorem)  and  on  the  convergence 
in  probability  of  the  experiments. 

Theorem  5.1  For  alt  a  €  ©,  i/  the  deterministic 
model  is  identifiable,  then  any  ULE  sequence 
{S' ,  £  >  0}  converges  tn  Pa., -probability  to  the 
“true”  parameter  :  for  all  S  >  Q 

P„,.(|S‘-o|>S)liio  . 


Another  issue  is  the  rate  of  convergence  of  the  MLE 
sequence  to  the  true  value  of  the  parameter.  The  so¬ 
lution  to  this  question  relies  on  proving  a  local  asymp¬ 
totic  normality  (LAN)  result.  This  is  currently  under 
investigation. 

Other  problems  should  also  be  considered,  includ¬ 
ing  ;  large  time  asymptotics,  recursive  (on-line)  esti¬ 
mation,  and  eidaptive  filtering. 
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Abstract 


In  this  paper  we  provide  a  consistency  result  for  the  MLE  for  partially  observed  diffu¬ 
sion  processes  with  small  noise  intensities.  We  prove  that  if  the  underlying  deterministic 
system  enjoys  an  identifiability  property,  then  any  MLE  is  close  to  the  true  parameter 
if  the  noise  intensities  are  small  enough.  The  proof  uses  large  deviations  limits  obtained 
by  PDE  vanishing  viscosity  methods.  A  deterministic  method  of  parameter  estimation  is 
formulated.  We  also  specialize  our  results  to  a  binary  detection  problem,  and  compare 
deterministic  and  stochastic  notions  of  identifiability. 

Key  words:  Parameter  estimation,  nonlinear  filtering,  large  deviations. 


1980  subject  classifications:  62F12,  93E10,  93E11,  60F10 


R4sum^ 


On  demontre  la  consistance  du  maximum  de  vraisemblance  pour  I’estimation  de  para- 
metres  dans  les  processus  de  diffusion  partiellement  observes,  dans  le  cas  de  petits  bruits. 
Si  le  systeme  deterministe  sous-jacent  est  identifiable,  alors  tout  estimateur  du  maximum 
de  vraisemblance  est  proche  de  la  vraie  valeur  du  parametre  inconnu,  pourvu  que  les  bruits 
soient  assez  petits.  La  demonstration  utilise  des  resultats  de  grandes  deviations,  qui  sont 
obtenus  par  des  techniques  d’EDP  (vanishing  viscosity).  On  applique  ce  resultat  a  un 
probleme  de  detection  sequentielle,  et  on  compare  les  notions  deterministe  et  stochastique 
d’identifiabiiite. 

Mots-Cies:  Estimation  de  parametres,  filtrage  non-lineaire,  grandes  deviations. 


Contents 


1  Introduction  1 

2  Maximum  Likelihood  Estimation  2 

3  Deterministic  Parameter  Estimation  5 

4  Consistency  Result  for  MLE  11 

5  Binary  Sequential  Detection  15 


A  Appendix 


21 


1 


Introduction 


In  this  paper  we  provide  a  consistency  result  for  the  Maximum  Likelihood  Estimator 
(MLE)  for  partially  observed  diffusions  with  small  noise. 

The  problem  of  computing  the  MLB  for  partially  observed  diffusions  has  received  re¬ 
cent  attention.  Dembo  and  Zeitouni  [7]  have  investigated  the  EM  algorithm,  and  Campillo 
and  Le  Gland  [2j  have  compared  this  algorithm  with  a  direct  maximization  approach.  Of 
course,  the  goal  of  such  efforts  is  to  compute  a  good  estimate  of  the  unknown  parameter. 
The  success  or  otherwise  of  such  algorithms  depends  on  whether  the  MLE  itself  is  a  good 
approximation  to  the  unknown  parameter.  The  purpose  of  this  paper  is  to  address  this 
question  of  consistency  when  the  diffusion  and  observation  noise  intensities  are  “small” . 

Our  result  was  in  part  inspired  by  some  large  deviations  limit  results  for  nonlinear 
filtering  in  Hijab  [11],  James  and  Baras  [12],  James  [13].  The  theory  of  large  deviations 
for  diffusions  with  small  noise  is  presented  in  Freidlin  and  Wentzell  [10].  We  exploit  the 
fact  that,  on  finite  time  intervals,  the  diffusion  X  with  observations  Y  aie  “close”  to  a 
deterministic  process  with  observations  y".  We  formulate  a  deterministic  method  of 
parameter  estimation  for  this  deterministic  process. 

We  prove  that  if  the  underlying  deterministic  system  is  identifiable  and  if  a  is  the  true 
parameter,  then  any  MLE  d‘  is  close  to  a  if  e  >  0  is  small  enough.  Our  proof  uses  PDE 
vanishing  viscosity  methods  and  Laplace’s  asymptotic  method. 

As  an  application  of  our  results,  we  study  a  binary  sequential  detection  problem, 
discussed  in  Baras  and  La  Vigna  [1],  when  the  noise  intensities  are  small.  Deterministic 
and  stochastic  notions  of  identifiability  are  compared  in  the  context  of  threshold  decision 
policies. 
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2  Maximum  Likelihood  Estimation 

On  a  measurable  space  (H,  we  consider 

•  for  each  £  >  0,  a  family  0  g  0}  of  probability  measures, 

■  a  nair  of  stochastic  processes  X  =  {Xj ,  0  <  f  <  T}  and  Y  =  {Yt ,  0  <  t  <  T} 
taking  values  in  R™  and  R"*  respectively, 

such  that  under 

dX,  =  be{X,)dt  +  dWf-^  ,  Xo  ~  pS’'(i)  dx  , 
dYt  =  h»{Xt)dt  +  dK.*-'  ,  n  =  0  , 

where  {Wf*  >  0  <  ^  <  T'}  and  {Vt'‘ ,  0  <  t  <  T}  are  independent  Wiener  processes,  with 
covariance  matrices  £/„  and  eli  respectively,  and  Xq  is  a  random  variable  independent 
of  the  Wiener  processes,  with  density  of  the  form 

i  exp{-i5^{x)}  .  (2.1) 

The  set  of  parameters  0  C  R*”  is  compact,  and  the  coefficients  satisfy  the  following 
hypotheses 

(i)  for  all  e  ^  e,  be  €  and  hg  g  C^CR”*,  R-*), 

(tt)  for  all  9  ^  Q,  Sg  is  convex,  locally  Lipschitz  continuous,  and  for  some 
Xg  g  R"*,  Sg(xg)  =  0  ,  Sg(x)  >  0  if  X  ^  Zg .  ^ssumc  also 

c,  +  C.|z|^  >  5»(x)  >  Cz|x|  -  , 

for  all  X  e  R"  ,  0  g  0 . 

Further,  the  functions  be,  he  and  Sj  depend  continuously  on  the  parameter  9  in  the  sense 
that 


(hi)  for  each  6  >  0,  R  >  0,  there  exists  -y  >  0  such  that  \9'  -  0|  <  7  implies 

sup  |6s'(x)  -  6s(x)|  <  6  ,  sup  Ihs-tx)  -  h,(i)l  <  6  , 
leR™  leR" 

sup  |5S'(i)  -  Su(x)|  <  i  . 
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There  is  no  loss  in  generality  in  assuming  that  is  the  canonical  space  C([0,  T\ ;  R’"'*''*), 
in  which  case  X  and  Y  are  the  canonical  processes  on  C([0,T] ;  R"*)  and  (^([O.r] ;  R”*) 
respectively,  and  P$_t  is  the  probability  law  of  (Jf,  T). 

It  is  assumed  that  only  Y  is  observed.  Let  ^  denote  the  tr-algebra  generated  by 
the  process  Y  on  CdO,  T] ;  R"*).  The  probability  measures  in  M‘  are  mutually  absolutely 
continuous,  and  the  log-likelihood  function  for  estimating  the  parameter  6  in  the  statistical 
model  given  yr,  can  be  expressed  (note  the  minus  sign)  as 

-e‘(e)  =  eiogEl{z^-‘\yT). 

Here  Pj ,  is  a  probability  measure  equivalent  to  Pg^c,  with  Radon-Nikodym  derivative 
^  " ^p\y^h;(X.)dY, -^£\hsiX.)\^ds^  , 

so  that  under  Pj  ^ 

dX,  =  bs{Xt)dt  +  dWf  ‘  ,  Xo  ~  p2'(x)dx  , 

where  {VT*' ,  t  >  0}  and  {K, ,  t  >  0}  are  independent  Wiener  processes,  with  covariance 
matrices  elm  and  eld  respectively,  and  the  random  variable  Xq  is  independent  of  the 
Wiener  processes,  see  [2]. 

The  maximum  likelihood  estimate  (MLE)  of  the  parameter  6  in  the  statistical  model 
M‘,  is  defined  on  the  canonical  space  C([0, 7^ ;  R**)  by 

9‘  6  argminP{^)  . 
see 


The  likelihood  function  can  be  computed  through  the  solution  of  the  Zakm  equation 

dp*'(x,t)  =  lL;y-‘](x,t)dt  +  h;{x)pO‘(x,t)dY,  ,  (2.2) 

where  LJ,  is  the  adjoint  operator  of  the  infinitesimal  generator  Ls,c  of  the  diffusion  process 
X  under  the  probability  measure  Pg^ 


I  —  ‘p  ^  ^ 

L»,.  -  2- 


,  dx,dx,  ^  ^*dx. 


Indeed 

(’{B)  =  -eiogj^y^(x,T)dx  .  (2.3) 

The  filtering  problem  is  discussed  in  detail  in  Liptser  and  Shiryayev  [15].  The  following 
lemma  is  proved  in  the  Appendix. 
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Lemma  2.1  The  log-likelihood  function  depends  continuously  on  the  parameter 

0  6©  o.s. 


Let  now  8  be  fixed.  When  s  j  0,  the  following  weak  convergence  result  holds  on 
C{[0,T] ;  R™+‘'): 

^  ^(**,v*)  > 

where  (x^,y*)  is  given  by  the  deterministic  differential  system 


(E*) 


I  if  =  i>s(xf)  . 
I  y?  =  Hxt) , 


!/o  =  0. 


In  particular,  for  all  0  €  6  ,  £  >  0 


Ps,,(  sup  |y,  -  yfl  >  6) 

0<t<T 


tio 


(2.4) 


see  Freidlin  and  Wentzell  [10]. 


Remark  2.2  As  long  as  e  >  0,  the  probability  measures  in  M‘  are  mutually  absolutely 
continuous,  which  tJlows  us  to  define  the  log-likelihood  function  -f'(0).  On  the  other 
hand,  asymptotically  when  e  1  0,  these  probability  measures  look  more  and  more  mutually 
singulair,  which,  together  with  an  identifiability  property  of  the  underlying  deterministic 
system,  indicates  that  the  MLE  may  be  consistent.  Actually,  this  result  will  be  proved 
below. 

The  purpose  of  the  next  Section  is  to  consider  the  problem  of  estimating  the  unknown 
parameter  0  in  the  deterministic  model  Ad®  =  {(E*) ,  0  6  6}. 
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3  Deterministic  Parameter  Estimation 

Consider  the  family  =  {(E*),  8  G  0}  of  deterministic  differential  systems 

r  i*  =  6»(if)  ,  I?  =  X* 

(£')  {  (3.1) 

[  jif  =  ,  1/0  =  0. 

Note  that  for  all  G  0,  (E*)  describes  the  weak  limit  as  e  |  0  of  the  family  of  probability 
measures  {Po,t ,  e  >  0}. 

The  problem  is  to  estimate  the  unknown  parameter  B  on  the  basis  of  an  observation 
record,  which  is  supposed  to  be  the  output  of  some  deterministic  differential  systems  in 
Af“.  Introduce  the  following  definition: 

Definition  3.1  The  model  is  identifiable  on  [0,  T]  tf  for  alls'  ^  B  in  0,  there  exists 
t  G  [0,T]  such  that 

Vi  i'Vt  ■ 

In  other  words,  the  mapping  B  i->  {yf ,  0  <  t  <  T}  is  injective.  The  deterministic 
parameter  estimation  problem  consists  of  inverting  this  mapping.  This  can  be  expressed 
in  terms  of  the  following  variational  problem. 


Define  the  following  functional  on  CdO.r] ;  R"") 

,  “  (3.2) 

+1 1  \yt-hs{Q?ds-\f^\y^\Us, 

if  ^  is  absolutely  continuous,  v/*(^,t)  =  +oo  otherwise.  For  all  x  6  R"  set 

W»(x.<)  =  inf{^«.0  :  C<  =  ^}  •  (3.3) 

The  value  function  W^{x,t)  is  continuous  in  (x,  t)  and  is  the  unique  viscosity  solution  of 
the  Hamilton-Jacobi  equation  [12] 

^W^ix,t)  +  Hiix,t,nw^ix,t))  =  0,  W*{x,0)  =  5»(x),  (3.4) 

where  the  Hamiltonian  P*(i,t,  A)  is  defined  by 

Hi(x,  t,  A)  i  max  {A*(6,(x)  +  u)  -  i|u|*}  -  i|y“  -  h,(x)|*  +  ijyrP 

=  b;{x)X  +  i|Ap  +  h;{x)y^  -  5|fi»(x)p  . 


5 


For  definitions  and  an  introduction  to  viscosity  solutions  of  Hcunilton- Jacobi  equations, 
the  reader  is  refered  to  Craindall  and  Lions  [3],  Crandall,  Evans  and  Lions  [5], 

Consider  the  following  functional,  defined  on  6  by 

T)-.  is  C-([0,  T] ;  BT)}  ■  (3.6) 

A  deterministic  estimate  (DPE)  of  the  unknown  parameter  0  in  the  model  on  the 
basis  of  the  observation  record  {yf  ,  0  <  t  <  T}  is  defined  by 

€  Ma  =  argmin^a(^)  •  (3.7) 

The  main  result  of  this  section  is  the  following: 

Theorem  3.2  If  the  model  Ad®  is  identifiable,  then  for  all  a  e  Q 

=  {a}  . 


Thus,  under  the  identifiability  hypothesis,  the  DPE  is  uniquely  defined  and  the  unknown 
parameter  can,  in  principle,  be  computed  exactly  from  (3.4),  (3.6),  (3.7).  Before  proving 

Theorem  3.2,  we  give  a  lemma  which  ensures  that  argmin  ta(^)  ^  0,  and  also  provides 

See 


useful  estimates. 


Lemma  3.3  For  all  a  €  Q 

(i)  there  are  constants  C  >  0,  C  >  0  such  that,  for  all  x  g  R”*,  0  S.  Q 
Ci\xf-¥C>W^{x,T)>C\x\-C'  , 

(li)  for  all  R  >  0,  b  >  0  there  exists  7  >  0  such  that  \0'  —  0\  <  f  implies 

sup  \w!:(x,T)-W^(x,T)\<6  , 

ziB{0,R) 

(ill)  the  mapping  0  i->  (■a(0)  is  continuous. 

Proof.  In  the  sequel,  every  constant  independent  of  0,o  g  ©  and  (x,t)  g  R”*  x  [0,7^ 
will  be  denoted  by  C  or  C.  For  any  absolutely  continuous  function  (  g  C{[0,T] ;  R") 
and  any  A  >  0,  we  have 
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and  by  Gron wall’s  lemma, 

161"  <  (|6I"  +  a  /  Kr|"dr)  exp{(t  -  s)/A}  .  (3.8) 

Since  sup  sup  |/i«(x)|  <  ^  it  follows  that,  for  all  a  £  6 
«€e  leR" 

^f{y:fds<C, 

and  hence  for  all  a,  0  €  0 

t)>-C  ,  (x,  t)  6  R™  X  [0,  T\  . 

Let  denote  the  Lagrangian  in  (3.2).  It  is  easy  to  prove  the  following  estimates 

i  I  -  b»Ur)\^dT  +  i  jf‘  dr  <  J'  LiUr,ir,r)dT  +  C  , 

^  /  \ir  -  6s(f.)|"  dr  <  J‘  dr  +  J‘  |6,(^0I"  dr  <  jf'  dr  +  C  . 

In  particular 

i  J‘  I  ds-C<  Jia,  ()  <  Sldo)  +  J‘  161"  ds  +  C. 

Proof  of  (i):  Setting  ^  =  x  on  [0,T],  gives  for  0  <  t  <  T 

<  d^(6t)  <  Sl{x)  +  C  <  C,|x|*  +  C  . 

Choose  A  >  0  such  that  N  —  T/ A  is  an  integer  and  4eCi  A  <  j.  For  n  =  1,. . .  ,N  the 
Dynamic  Programming  principle  implies 

W^{z,nA)  =  inf  (vF(6n-i)a,(n  -  1)A)  +  f  L*(6,6.s)ds  :  6n  =  4  • 

{  f  ./(n-I)a  J 

Given  i  >  0,  recursively  select  £  (7(10,  T] ;  R*")  for  n  =  iV, . . . ,  1  as  follows:  =  i, 

Co'  =  Co  and 

<(6"-i)o,  (n  -  1)A)  +  Li(C, 6".  ^)ds  <  <(Co.  »A)  +  ^ 

(3.9) 

<  C'llCol"  ^  ■ 

Then 
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and  from  (3.8) 

ICJ*  <  (lCn-:)Ar  +  A  e  <  e  +  l\(^f  +  i(C  +  ^)/C,  , 

which  implies 

^  2e  |fJ,_i)A|^  +  (C  +  'j^)/Ci  ■  (3.10) 

Define  G  C([0,  T] ;  R”*)  by  for  t  6  [(n  -  1)A,  nA],  n  =  1, ....  AT.  Then  ^ J  =  i 

and  by  iterating  (3.10)  we  obtain 

1^0*1*  +  C"- 

Now  also,  by  iterating  (3.9) 

Jt(e,T)<W^{x,T)  +  6  ,  (3.11) 

cind  consequently 

Ki^,T)  >  Jt{e,T)  -S>  SSi^S)  -C>  Clx\  -  C  , 

which  proves  (i). 


Proof  of  (ii):  Let  R  >  0,  i  >  0  and  *  €  5(0,  R).  Choose  as  in  (3.11).  Then,  from  the 
above  estimates, 

f\i*\Ua<CR. 

Jo 

Using  (3.8)  we  deduce  that  if  x  6  5(0, 5),  then  there  exists  i?  >  0  such  that  G  5(0, 5') 
for  ciU  S  G  0.  Therefore 

<'(x,T)  -  Wt{x,T) 


=  5?'(?*)  -  SliC)  +  1  le  -  da-\j^  \il  ~  6s(f;)P  ds 

H  [  |y“  -  |j/“  -  ds  +  \5. 

Now,  if  |0'  —  is  small  enough 


|5S'(^o*)-5oU*)l<J« 


1 

2 
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Also 


i  I  r  1^?  -  ds  -  r  lit  -  )!=*  ds\ 

Jo  Jo 

+i  f  |6s'(^!)  -  6s(^?)l  |6«-(«!)  +  6»(^?)l  ds<^6, 

if  \6'  —  6\  is  small  enough.  Hence,  there  exists  7  >  0  such  that  1^'  —  ^|  <7  implies 

Wt{x,T)  -  Wi(x,T)  <  8  . 

Reversing  the  role  of  6'  and  6  proves  (ii). 

Finally,  (iii)  follows  from  (i)“(ii)  and  Lemma  A. 2.  □ 

Proof  of  Theorem  3  2.  From  (3.2),  (3.3)  and  (3.6)  we  have 

for  all  ^  €  0  and  {  €  C([0,T] ;  R”),  so  that  la{0)  >  Co^  From  (3.1)  we  have 

J:(x\T)  =  Ca  , 

so  that  for  all  d  €  0,  f<i(o:)  =  c„  <  ia(6).  Therefore  a  £  Ma- 

Assume  that  9  6  Ma-  Then  fa(?)  =  fa(a)  and 

iaW  =  inf{jf(e,T)  :  ^  €  C([0,ri ;  R”*)}  =  jI(IT)  , 
for  some  f  £  C([0,T] ;  R™),  since  Jt(-,T)  is  lower  semi-continuous.  Then  from  (3.2) 

(0  5f(fo)  =  0  , 

(”)  t  =  >  0<3<T  , 

[iii)  =  h^i,)  ,  0  <  s  <  T  . 

From  (i)  Co  =  therefore  by  (ii)  and  (3.1)  (,  =  xf  ,  0  <  s  <  T  .  Then  (iii)  aind 

(3.1)  imply 

yt  =  '  0<s<T. 

Now  since  the  model  is  identifiable,  this  equality  forces  9  =  a,  which  proves  the 
theorem.  □ 
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Remark  3.4  The  notion  of  identifiability  is  reminiscent  of  a  notion  of  observability  for 
nonlinear  systems,  which  also  has  a  variational  characterization,  see  James  [13]  [14]. 
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Consistency  Result  for  MLE 


The  main  result  of  this  paper  is  the  following; 

Theorem  4.1  For  all  a  Q 

(i)  any  MLE  sequence  {6‘  ,  e  >  0}  converges  in  Pa,c -probability  to  the  deterministic  set 
Ma-'  for  all  S  >  Q 

P.Ad{e\M^)>S)^0  , 

(li)  if  the  deterministic  model  is  identifiable,  then  any  MLE  sequence  {6‘ ,  £  >  0} 
converges  in  Pa,e -probability  to  the  “true”  parameter:  for  all  6  >  0 

PaA\S‘  -a\>6)i^0. 

The  proof  of  this  theorem  depends  on  a  technical  extension  of  large  deviations  limit 
results  for  nonlinear  filtering  contained  in  James  and  Baras  [12],  Jcunes  [13].  We  need  to 
show  that  certain  limits  are  uniform  in  the  parameter  6  6©-  The  key  technical  lemma  is 
the  following: 

Lemma  4.2  The  sequence  >  0}  converges  in  Paj -probability  uniformly  in 

6  e  &  to  (A^)-  all  6  >  0 

P^A^np\(’(9)  -  eA9)\  >  S)  ^  0  . 
see 

We  next  prove  Theorem  4.1  using  Lemma  4.2;  the  remainder  of  this  section  is  con¬ 
cerned  with  proving  Lemma  4.2. 

Proof  of  Theorem  4.1.  By  Lemma  A.l  for  all  6  >  0  there  exists  7  >  0  such  that 

{sup  \F{9)  -  <  7}  C  {d(9\  MA  <  b}  . 

see 

Therefore,  by  Lemma  4.2 

P.Ad(9‘,  MA  >6)<  P„..{sup|r{(?)  -  4(^)1  >  7)  0  , 

See 


which  proves  (i). 

The  proof  of  (ii)  follows  at  once  from  (i)  and  Theorem  3.2. 


□ 
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As  in  James  zind  Baras  [12],  James  [13],  we  employ  the  vanishing  viscosity  method  of 
Evans  and  Ishii  [8].  We  proceed  by  a  logaritlunic  change  of  variables  used  by  Fleming 
and  Mitter  [9].  Define 

W‘'‘{x,  t)  =  -£  logp*’'(i,  t)  .  (4.1) 

The  J^-measurable  random  variable  t)  +  hl(x)Yt  can  be  extended  to  a  continuous 

function  defined  on  the  whole  canonical  space  Qo  =  {»;  £  C([0,T] ;  R"*)  :  tjo  =  0},  which 
we  denote  by  u*’'[i;](i,t),  see  [9]  and  [12).  For  any  fixed  r)  S  Uo 


u«’'[7/]  €  C"'‘(R’"  X  [0,T] ;  R) 

is  the  unique  solution  of  the  Hamilton-Jacobi-Bellman  equation 

=  5o(x)  -  elogC»,« 

where  the  Hamiltonian  /f*’‘[77](i,t.  A)  is  defined  by 

/f'''W(x,<,A)  i  gl{x,rH)\  +  i|A|^  -  V»-^{x,rH)  , 
V*'‘{x,Ti)  =  V^{x,Ti)  +  5ei;*Ahs{x)  +  £divff»(x,»?)  , 
V^{x,rj)  =  i|hs(x)p  +  b;r,-Dhe(x)  -  §(Dhs(x))*7?,*  Dke{x)  , 


(4.2) 


(4.3) 


ge{x,ri)  =  bt(x)  -  ii'Dhg(x)  . 


Next,  for  t;  €  Ho  let 

u*[7;]€C(R’"x[0,r];R) 

denote  the  unique  viscosity  solution  of  the  Hamilton- Jacobi  equation 

+  =  u»’'[7,](z,0)  =  S»(x)  (4.4) 

where  the  Hamiltonian  //'*[7;](i,t,  A)  is  defined  by 

f/»[T,](x,t,A)  ^  g;(x,rH)\  +  ||A|^  -  V>(x,jh)  •  (4.5) 

Lemma  4.3  We  have 

Umu*’'[»7](x,t)  =  u*[77](x,t)  , 

uniformly  in  8  €  Q  and  t  €  [0,T]  and  uniformly  on  compact  subsets  of  7)  €  flo  “nd 

X  e  R™. 
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Proof.  The  following  estimates  are  obtained  as  in  James  and  Baras  [12],  James  [13], 
using  methods  introduced  in  Evans  and  Ishii  [8],  Crandall  and  Lions  [4].  Let  >  0  and 
A’  C  flfl  be  compact.  Then  if  e  >  0  is  sufficiently  small,  we  have 

<  c 

|u*’'[77l(x,t)  -u'’''[i7l(i,s)|  <  C(v/£l«  -  s]2  +  jt  _  s]) 

for  some  constant  C  >  0  and  for  all  fl  e  ©,  t  €  [0,T],  tj  6  K  and  x  e  B(0,R).  By 
the  Arzela-Ascoli  theorem,  there  is  a  subsequence  £*10  such  that  u*’'‘[r;]  converges 
uniformly  on  B{Q,  R)  x  [0,T]  to  a  continuous  function  w.  This  function  satisfies  the 
Hamilton-Jacobi  equation  (4.4),  and  by  uniqueness,  w  =  u*]?;]  (Crandall  and  Lions  [3]). 
Hence  u*’'[r?]  — *  u*[t;]  as  £  (  0. 

Now  u*[t)]  is  a  continuous  function  ofr]eK,de  &  (see  the  proof  of  Lemma  3.3  (ii)). 
Using  this  fact  and  the  uniform  estimate  above  we  conclude  that  the  convergence  is 
uniform.  □ 

Now 

w^*-‘(x,t)  =  u*-‘[r](x,t)-h;(*)y, 

and 

w^i(x,t)  =  «viK0-*;w- 


Lemma  4.4  We  have 

UmH^*’'(x,t)  =  W^{x,t) 

Pa.e~^‘^<^hability  uniformly  in  S  ^  0,  t  €  [O.T]  and  uniformly  on  compact  subsets  of 
X  e  R-". 

Proof.  Let  p  denote  a  metric  on  C'(R"*  x  [0,r] ,  R)  corresponding  to  uniform  conver¬ 
gence  on  compact  subsets.  By  (2.4),  it  is  enough  to  show  that  for  each  (5  >  0 

P„,,(supp(u‘''‘[y],u«[y“])  ><5)  0  . 

Choose  ^  >  0  such  that  ||7?  -  j/°||  <  P  implies 

supp(u*[T;|,u*[y"])  <  \b  . 
see 

From  Lemma  4.3,  if  [[tj  -  y“||  <  /?  and  0  <  £  <  £o  then 

8up/j(u*''[»j],u*[q])  <  . 
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5  Binary  Sequential  Detection 

In  this  section  we  discuss  some  aspects  of  a  binary  detection  problem  studied  by  Baras 
and  LaVigna  [1],  when  the  noise  intensities  are  small. 

Let  0  =:  {0, 1}  and  let  X  and  Y  be  the  signal  and  the  observation  processes  described 
in  Section  2.  For  e  >  0  fixed,  we  consider  the  two  hypotheses  Ho  and  Hi.  Under  Ho  the  law 
of  {X,  Y)  is  Po,c,  whilst  under  Hi  the  law  of  (X,  Y)  is  Pi  ^-  The  problem  is  to  determine 
which  hypothesis  is  true,  that  is  to  detect  the  signed.  In  this  section,  il  =  C'([0,  oo),  R”*"''''). 

A  key  technical  assumption,  essentially  an  identifiability  condition,  used  in  Baras  and 
LaVigna  [1]  is  the  following 

/  |V£(t)  -  =  oo  a.s.  (5.1) 

JQ 

where 

ho,e(t)  i  EeAheiX,)  1  y.)  , 

and 

yt  =  <7(y. ,  0  <  s  <  f )  . 

The  deterministic  analogue  of  (51)  is 

/  ly?  -  =  00  •  {5  2) 

Jo 

Clearly,  (5.2)  implies  that  the  model  M°  defined  by  (3.1)  is  identifiable.  In  fact,  if 

<r  =  inf{r  >  0  ;  f  lyf  -  dt  >  0}  , 

Jo 

then  is  identifiable  on  each  interval  [0,  T]  with  T  >  a. 

The  following  result  is  a  consequence  of  Theorem  4.1. 

Theorem  5.1  Assume  (5.2)  holds  andT  >  a.  Define  the  MLE  S'  for  the  interval  [0,  Tj. 
Then,  for  a  =  0, 1 

Pa.e0‘  =  a)  —  1  . 


In  [1],  Baras  and  LaVigna  use  a  threshold  decision  poUcy  to  decide  which  of  the 
hypotheses  is  valid.  Define  the  likelihood  ratio 

A5.  =  exp  i  I  fj'hiAt)  -  hoAt)V  dYi  -  |  /jl*i,e(t)l*  -  I  • 
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Note  that  as  £  j  0, 


flogAf  X 


~2  f  1%'  ~  under  Ho 

Jo 

+  2/^  lirf-jf/pdt  under  Hi 


A  threshold  policy  u‘  =  consists  of  a  ,  t  >  0}-stopping  time  t'  and  a 

“Measurable  {0,  l}-valued  random  variable  defined  by 

t'  =  inf{r>  0  :  A^  ^(e-Z^.e^/')}  , 

f  1  if  a;.  =  e‘/'  , 

6'  = 

[  0  if  A^.  =  e“/'  , 

for  some  constants  a  <  0  <  6.  If  =  1  we  decide  that  hypothesis  Hi  is  valid  (i.e.  that 
9  =  1),  whilst  if  ^'  =  0  we  decide  Hp  (i.e.  6  =  0).  Of  course,  our  decision  may  be  in  error. 
Define  an  error  probability  for  the  policy  u‘ 

e(u‘)  =  PoAS'  =  1)  +  Pu{i‘  =  0)  . 


Theorem  5.2  If  (5.1)  holds,  then 


e(u')  ^  0  . 


Proof.  Under  assumption  (5.1),  Baras  and  LaVigna  [1]  prove  that 

r'  <  00  a.s 


1  _  goAfeV' _  i\ 

Fo,(i'  =  l)  =  — - p^16‘  =  o)  =  —A - 

'  eVt  _  e“/t  *■''  '  e‘A  -  e“/' 

Since  a  <  0  <  6,  the  conclusion  follows.  n 

Thus,  assuming  (5.1),  the  probability  of  making  an  incorrect  decision  converges  to 
zero  as  £  I  0,  and  so  (5.1)  can  be  viewed  as  an  identifiability  criterion  for  the  statistical 
model  Ad'  =  {Po,c ,  Pi.t}- 


We  can  define  a  deterministic  threshold  policy  u  =  (t,  S)  as  follows.  Define 
^r  =  5^  |yr-V«l^dt- 
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Let  a  <  0  <  b  and  set 


r  =  inf{r  >0  :  Ft  ^  (a,  6)}  , 

s  =  l'  '' 

[  0  if  f  ,  =  a  . 

Theorem  5.3  Assume  that  (5.2)  holds.  Then  for  any  threshold  policy  u  =  [t,6)  with 
a  <  Q  <  b,  we  have  r  <  oo  and 

6  =  1  if  and  only  i/  Hi  is  valid  , 

6  —  0  if  and  only  if  Ho  is  valid  . 

Proof.  Under  Hi  ,  j/(  =  y)  and  for  T  >  0 

FT  =  \[\y^,-y]?dt>0. 

By  (5.21,  there  exists  Ti  >  0  such  that  Ftx  =  b.  Consequently  t  <T\  and  6  =  1. 
Similarly,  under  Ho  ,  =  y?  and  for  T  >  0 

Ft  =  -i  {  \y}  -  <  0. 

JO 

We  conclude  again  r  <  oo  and  6  =  0.  □ 

Thus  a  deterministic  threshold  policy  always  makes  the  correct  decision  under  the 
(stronger)  identifiability  condition  (5.2). 

To  '  impute  u'  (approximately),  Baras  and  LaVigna  [1]  use  a  numerical  solution  of 
th,  ’.ai  equation.  The  above  suggests  an  approximation  when  e  )  0  is  small.  Now 

Ft  =  FT(y°,y^;y)  . 

Compute  approximations  y*  to  y°,  y‘  by  numerically  integrating  the  differential  system 
(3.1).  Set 

H  =  FT{f,y^-,Y), 

where  Y  is  the  noisy  observation  record.  Now  define,  for  o  <  0  <  6 

f'  =  inf{T  >  0  :  ^  (a,  b)}  , 

f  1  if  F!.=b, 

I  0  if  F?.  =  o  . 
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If  the  integration  is  sufficiently  accurate,  then  we  expect  for  o  =  0, 1 


PaAi'  =  S‘)^0. 

Note  that  |a|,  b  can  be  increased  to  increase  the  level  of  confidence. 

Remark  5.4  In  practice,  the  initial  condition  Xo  is  not  known,  so  that  one  would  have 
to  estimate  Xq  also,  for  instance  using  an  observer. 
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A  Appendix 


This  Appendix  contains  some  technical  results  used  in  the  paper,  and  a  proof  of  Lemma 

2.1. 

Lemma  A.l  Let  A  C  IV  be  compact.  For  any  4>  €  C(A,R)  define  the  set 

=  argmin(^(A)  . 

AEA 

/ifl  c  C(A,R).  Then  for  alia  >  0  there  exists  /3  >  0  such  that 

sup  |/(A)  -  p(A)l  <  0  implies  VA  £  M(g)  ,  d(A,  M(f))  <  a  . 

A 

Proof.  If  not,  there  exists  a  >  0  and  a  sequence  {g^ ,  i  >  0}  such  that 
sup  |/(A)  —  g,(A)|  — >  0  as  i  — •  00  , 

ASA 

and 

d(Aj,  M(f))  >  a  for  some  A*  6  M{gi)  . 

Since  A  is  compact,  we  can  assume  that  A;  — *  A*  6  A  as  t  -*  oo.  Consequently 

d{X’,M{f))>a  .  (A.l) 

Let  A(/)  e  M(f).  Then 

f(K)  =  fiXf))  +  [S.(A(/))  -  fiMm  +  [<7.(A.)  -  <?.(A(/))1  +  (/(A.)  -  s.(A.)! 

<  /{A(/))  +  1<?.{A(/))  -  /(A(/))]  +  [/(A.)  -  g,C\)] 

<  /(A(/))  +  2supl/(A)  -  g,(A)|  , 

AeA 

Sending  i  — *  oo  we  obtain  /(A*)  <  /(A(/)).  That  is  A*  e  M(/)  which  contradicts  (A.l). 

□ 

Lemma  A. 2  Let  A  C  IV  be  compact,  and  6  C(R”',  R)  be  such  that 

(a)  there  are  constants  C  >  0,  C'  >  0  such  that,  for  all  z  6  R”*,  A  e  A 

F\z)  >  C\z\  -  C  , 
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(b)  for  all  R  >  a,  S  >  0  there  exists  7  >  0  such  that  |A'  —  A|  <7  implies 

sup  |F^(2)-i^-''(z)|<i  . 

teB(o,R) 


Define  =  inf  F*‘(z).  Then 
jeR" 

(i)  there  exists  a  constant  R  >  0  such  that,  for  all  X  E  h. 

argmin  F^{z)  C  B(0,  R)  , 
zeR™ 

(ii)  the  mapping  A  1— >  m*  is  continuous. 


Proof.  For  any  A  e  A  let  6  argmin  F^(z).  The  existence  of  z^  follows  from  the 

zeR"* 

continuity  of  F^  and  the  coercivity  hypothesis  (a).  Moreover 


and  thus  for  all  A  6  A 


m"  =  F\z^)  >  C|z^|  -  C  , 

k*l< 


.Ai  ..  rn^  +  C 


Fix  Aq  6  A.  By  (b)  for  each  ^  >  0  there  exists  7  >  0  such  that  (A  —  Aoj  <  7  implies 

<m^  +6  . 

Then  |A  -  Ao|  <  7  implies 


z^  e  B(0,  R)  ,  with  R  = 


A  m^°  +  S  +  C 


which  proves  (i). 


By  (b)  again,  this  implies 

<  F^{z^)  =  m-"  +  -  F^(2^)] 


<m''+  sup  |F-''’(2)  -  F''{«)|  <  +  «  , 

z6B(0.R) 

and  the  proof  of  the  lemma  is  now  complete. 


□ 


The  next  lemma  is  a  varicint  of  Laplace’s  asymptotic  method. 
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Lemma  A. 3  Let  A  C  TV  be  compact,  and  F*,G^  6  C(R”*,R)  ^e  sack  that 
(a)  there  are  constants  C  >  Q,  C  >  0  such  that,  for  all  z  €  R"*,  A  €  A 
F\z)  >  C\z\  -  C  ,  G\z)  >  C\z\  -  C  , 

(h)  for  all  R  >  0,  6  >  0  there  exists  7  >  0  such  that  |A'  —  A|  <  7  implies 

sup  \F\z)~  F^'{z)\<6  ,  sup  |G^(i)  -  G^'(2)|  <  6  . 

J€fl(0,R)  i£B(O.R) 

Let  p  denote  a  metric  on  G(R’“,  R)  corresponding  to  uniform  convergence  on  compact 
sets. 

Then,  for  all  6  >  0  there  exists  p  >  0,  £0  >  0  (depending  on  G)  such  that  0  <  e  <  Co 

and 

supp(f\G^)  <  p  , 

A6A 

implies 

sup  |e  log/  exp{— tiz  +  inf  G^(z)|  <  6  . 

ASA  I  JR™  £  «eR™  I 

Proof.  Define 

m^(F)  =  inf  F\z)  ,  m\G)  =  inf  G\z)  . 

'  ’  j€R™  '  '  '  »£R™  '  ' 


Lower  bound:  It  follows  from  Lemma  A. 2  that  the  mappin'  '  A  >-»  m^{F)  and  A  i->  m''(G) 
are  continuous.  Further,  there  is  a  constant  R  >  0  such  that 

argminG*(z)  C  B(0,  —)  , 
j£R"  ^ 

for  all  A  €  A.  Thus  we  can  choose  0  <  P  <  ^/12  such  that  supp(F’'',  G'')  <  P  implies 

A£A 


8up|m^{F)  —  m^(G)|  <  is  , 

A£A 


and 


argmin  F^(z)  C  0(0,  R) 
.€R™ 


for  all  A  €  A.  Set 

0^  =  {zeR”'  :  F^z)  -  m\F)  <  is)  . 

Increasing  R  if  necessary,  0f  C  0(0,  i?)  for  all  A  £  A  by  the  uniform  coercivity  hypothesis 
(a). 
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Now  (i,  A)  H-.  G^{z}  is  uniformly  continuous  on  B(0,  B)  x  A,  so  there  exists  r  >  0  such 
that 

|z  -  z'l  +  lA  -  A')  <  r  implies  |G*(z)  -  G^'{^')j  <  , 

and  also,  since  0  <  /3  < 

|F^(2)  -  F*'(2')I  <  2^6  +  , 

for  any  z,  z'  e  B{0,  R)  and  any  A,  A'  €  A. 

Let  z^  e  argmin  F^{z).  Then  z*  e  B{0,  R)  and 
xcR"* 

\z  -  2*1  <  r  implies  |F*(2)  -  m*(F)l  <  Ifi  , 
for  all  A  e  A.  That  is  5(2*,)  J  C  5^  for  all  A  6  A.  Therefore 

00  >  UB  >  p(B*)  >  t),  >  0  , 

where  p  denotes  the  Lebesgue  measure  in  R™,  and  d,  (resp.  vr)  denotes  the  Lebesgue 
measure  of  a  btill  of  radius  r  (resp.  R)  in  R"*. 

Now 

a*(£)  i  /  exp{-lF*(2)}d2 

"  Ib;  +  r'‘)} , 

and 

eloga*(£)  >  elogt),  -  m*fF)  - 

>  elogt),  -  m*(G)  -  >  -r)*(G)  -  S  , 

provided  0  <  e  <  £i  for  some  £i  independent  of  A  6  A, 


Upper  bound:  Let  0  <  i/  <  1.  The  uniform  coercivity  hypothesis  (a)  impUes 

exp{-jF*(2)}d2 

<  exp{-^-i^m*(F)}  ^^exp{-^F*(2)}d2 

<  exp{-^-^m*(F)}  exp{^^}  J^^exp{-'^\z\}  dz 

<  exp{-^-^m*(F)}  exp{^^}  (^)'»  , 
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for  all  e  >  0.  Therefore 

eloga^(£)  <  —(1  —  i/)m^(F)  +  i/C' +  Tne(loge  —  log i/C) 

<  —m^(G)  +  5(1  —  i/)S  +  vm^(G)  +  vC  +  7n£(loge  —  \ogvC)  ■ 

Choose  1/  so  small  that  PTn^{G)  +  vC  <  Next,  choose  0  <  eg  <  such  that 
me(log  e  —  log  i/C)  <  for  0  <  e  <  £o-  Then  we  have: 

e  log  a''(e)  <  —m^(G)  +  S 


provided  0  <  e  <  eg- 


□ 


We  turn  now  to  the 

Proof  of  Lemma  2.1.  From  Sections  2  and  4  we  have 

£’($)  =  exp{^/i;(z)yT}  dx  a.s., 

where  for  a.e.  w  €  fl,  9*  '  €  x  [0,T])  and  solves  the  “robust”  Zakai  equation 

-  i£A9»-'(x,t)  +  +  iv'*-'(x,t)9»-(i,t)  =  0  , 

q>’‘(x,0)=pl-‘{x), 

with 

t)  =  V^(x,  t)  +  leY*Ahg{x)  +  £djvgf(x,  t)  , 
t)  ^  ||*s(x)p  +  b;Y,'Dh»{x)  -  \(Dh,(x)y  Y  yy  Dh,{x)  , 
g»{x,  t)  =  6s(i)  -  YyDh){x)  ; 

see  Davis  [6].  Fix  e  >  0  and  w  e  fl  such  that  the  above  holds.  Now  |p«(x,t)|  <  C  and 
|l/«'(x,t)|  <  C  in  R"*  X  [0,r].  Then 

^9*''(*.0-  5«A9'''(x,t)  +  s;(i,t)D9*''(x,t)-  ^Cq^'‘(x,t)  <0  , 

and  by  the  maximum  principle,  for  all  (i,  t)  6  R™  x  (0,7’] 

1 

0  <  9*’'(i,0  <  exp{— }po'(a:)  <  exp{--(C2li|  -  C,  -  CT)}  , 
i.e. 

W*’'(x,t)  >  C2I1I  -  CJ  -  CT  , 
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where  C  is  random  and  satisfies  the  following  estimate 


C  <Co  sup  |K,  -  +  Cc  ■ 

0<t<T 

Therefore,  by  the  Lebesgue  dominated  convergence  theorem,  it  is  enough  to  show  that  if 
^0  in  ©  as  1:  — ►  oo,  then  T)  for  each  x  6  R™.  The  difference 

2  =  '  satisfies 

d  1 

^2(i,t)  -  5£A^(a:,t)  +  a;,(x,t)£)2(x,t)+  iv'‘’‘''(i,t)2(z,t) 

=  -  g0,{x,t)YDqO^’‘(x,t)  -  i[K*‘-'(x,t)  -  , 

and  hence 

—z[x, t)  —  ^£Az(x,t)  +  t)Dz(x, t)  —  —Cz(x, t)  <  C$^p(0k, ^o)(l  +  ~)  , 

where  p{Bk>^a)  — *  0  as  A;  — *  oo.  Then  by  the  maximum  principle 

C7  1 

z{x,  t)  <  exp{— }  2(x,  0)  +  r  exp{ - }  Ce^p{ek,do){\  +  -)  . 

*  €  £ 

Now 

2(x,0)  <  j  exp{-J(C2|x|  -  Ci)}|S»“(x)  -  SS‘(x)|  . 

Consequently,  sending  k  —*  oo  we  obtain 

limsup{9**’'(x,T’)  -  9*‘>-'(x,r)}  <  0  . 

Similarly,  we  obtain  the  reverse  inequality  and  conclude.  □ 
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